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This  is  the  UCS  three  afternoon  course  on  Python  for  people  who  have  no  experience 
of  programming  at  all.  We  warn  all  those  people  who  do  have  some  programming 
experience  and  who  are  here  just  to  add  the  Python  notch  to  their  bed  post  that  they 
will  be  excruciatingly  bored  in  this  course.  Those  people  who  do  already  know  how  to 
program  in  another  language  and  want  to  learn  Python  are  better  off  attending  the 
UCS  “Python:  Introduction  for  Programmers”  one  day  course.  For  details  of  this 
course,  see  http : //training . csx . cam . ac . uk/course/python4progs 

Note  that  the  UCS  Python  courses  cover  Python  2.4  to  2.6,  which  are  the  most 
common  versions  currently  in  use  -  it  does  NOT  cover  the  recently  released  Python 
3.0  since  that  version  of  Python  is  so  new.  In  some  places  Python  3.0  is  significantly 
different  to  Python  2.x,  and  this  course  will  be  updated  to  cover  it  as  it  becomes  more 
widely  used. 

The  official  UCS  e-mail  address  for  all  scientific  computing  support  queries,  including 
any  questions  about  this  course,  is:  scientif  ic-computing@ucs  .  cam  .ac.uk 
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So  what  will  this  course  cover? 

We  will  start  with  a  brief  introduction  to  Python,  looking  briefly  at  what  it  is  used  for  and 
how  we  launch  it  on  the  systems  being  used  for  this  course. 

Once  we  have  it  running  we  will  start  by  using  it  as  a  glorified  calculator  to  get  us  used 
to  its  features.  We  will  examine  how  it  handles  numbers,  text  and  the  concept  of  a 
statement  being  true  or  false. 
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But  Python  is  there  for  us  to  use  as  a  programming  language  so,  after  spending  a 
while  using  it  as  a  manually  operated  calculator,  we  will  start  to  use  it  as  a  fully-fledged 
programming  language. 

As  part  ofd  this  we  will  look  at  how  Python  stores  values  and  assigns  names  to  these 
stored  values.  We  will  look  at  the  three  fundamental  constructs  that  will  allow  us  to 
build  programs  that  actually  do  something,  (“if. ..then. ..else...”,  “while...  loops”,  and 
“for...  loops”) 

We  will  also  spend  a  lot  of  time  looking  at  how  Python  handles  lists.  There  are  two 
reasons  for  this.  First,  Python  uses  lists  a  lot  so  we  need  to  understand  them.  Second, 
Python  lists  are  the  first  example  of  a  computer  data  structure  that  doesn't  have  any 
analogue  in  the  usual  arithmetics. 

Then  we  will  look  at  writing  our  own  functions  that  use  what  we  have  learnt.  Functions 
permit  us  to  structure  our  code  in  a  more  maintainable  fashion.  We  will  look  at  how 
Python  groups  related  functions  together  and  what  groups  of  functions  is  provides 
ready-made.  These  groups  are  called  “modules”  in  Pythonic  language. 
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Once  we  know  the  rudiments  of  programming  in  Python  we  will  look  at  the  support 
functions  offered  by  the  base  Python  system.  These  will  let  us  access  the  system 
outside  of  Python.  The  main  example  of  this  will  be  accessing  the  file  system. 
Finally  we  will  look  at  one  last,  very  powerful  mechanism  for  storing  data,  the 
“dictionary”. 
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I  want  to  start  by  convincing  you  that  learning  Python  is  worthwhile.  Python  is  used  for 
every  scale  of  operation.  Here  is  a  spectrum  of  examples  running  from  the  largest  to 
the  smallest. 

The  Massively  Multiplayer  Online  Role-Playing  Game  (MMORPG)  “Eve  Online” 

supports  over  300,000  users  with  a  Python  back  end. 

http : //wiki . python . org/moin/PyCon2006/Talks#line-196 

Two  very  common  frameworks  for  web  applications  are  Django  (general  purpose)  and 
Plone  (content  management).  Both  are  implemented  in  Python. 
www.djangoproject.com  plone.org 

On  the  desktop  itself  there  are  frameworks  to  build  graphical  applications  in  Python. 
The  two  standard  Unix  desktop  environments  are  called  GNOME  and  Qt.  Both  have 
Python  support.  There  is  similar  support  under  Windows  and  MacOS. 
www.pygtk.org  www.pyside.org  www.wxpython.org 

There  are  plenty  of  command  line  programs  written  in  Python.  Some  Unixes 

(e.g.  OpenSUSE)  have  a  helper  program  they  call  when  the  user  asks  for  a  command 

the  shell  doesn't  know.  That  helper  program  is  written  in  Python. 

Within  programs  there  are  support  libraries  for  almost  every  purpose  including  a  very 
powerful  scientific  python  library  called  “SciPy”  (“Sigh-Pie”)  and  an  underlying 
numerical  library  called  “NumPy”. 
www. scipy . org 

Python  is  also  used  to  control  instruments  (a  simple  robot  is  featured  in  the  slide)  and 
is  also  used  in  embedded  systems.  The  card  shown  is  ““...IEEE802.15.4  based,  auto¬ 
forming,  multi-hop,  instant-on,  mesh  network  stack  combined  with  an  embedded 
Python  interpreter  for  running  application  code.” 
synapse-wireless . com 
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Languages  split  into  two  broad  camps  according  to  how  they  are  used,  though  it  is 
better  regarded  as  a  spectrum  rather  than  a  clean  split. 

Compiled  languages  go  through  a  “compilation”  stage  where  the  text  written  by  the 
programmer  is  converted  into  machine  code.  This  machine  code  is  then  processed 
directly  by  the  CPU  at  a  later  stage  when  the  user  wants  to  run  the  program.  This  is 
called,  unsurprisingly,  “run  time”.  Fortran,  C  and  C++  are  examples  of  languages  that 
are  treated  in  this  way. 

Interpreted  languages  are  stored  as  the  text  written  by  the  programmer  and  this  is 
read  by  another  program,  called  the  interpreter,  typically  one  line  t  a  time.  The  line  is 
read  and  parsed  by  the  interpreter  which  then  executes  any  instructions  required  itself. 
Then  it  moves  on  to  the  next  line.  Note  that  the  interpreter  is  typically  a  compiled 
program  itself. 

There  are  some  languages  which  occupy  the  middle  ground.  Java,  for  example,  is 
converted  into  a  pseudo-machine-code  for  a  CPU  that  doesn’t  actually  exist.  At  run 
time  the  Java  environment  emulates  this  CPU  in  a  program  which  interprets  the 
supposed  machine  code  in  the  same  way  that  a  standard  interpreter  interprets  the 
plain  text  of  its  program.  In  the  way  Java  is  treated  it  is  closer  to  a  compiled  language 
than  a  classic  interpreted  language  so  it  is  treated  as  a  compiled  language  in  this 
course. 

Python  can  create  some  intermediate  files  to  make  subsequent  interpretation  simpler. 
However,  there  is  no  formal  “compilation”  phase  the  user  goes  through  to  create  these 
files  and  they  get  automatically  handled  by  the  Python  system.  So  in  terms  of  how  we 
use  it,  Python  is  a  classic  interpreted  language.  Any  clever  tricks  it  pulls  behind  the 
curtains  will  be  ignored  for  the  purposes  of  this  course. 
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So,  if  an  interpreted  language  takes  text  programs  and  runs  them  directly,  where  does 
it  get  its  text  from?  Interpreted  languages  typically  support  getting  their  text  either 
directly  from  the  user  typing  at  the  keyboard  or  from  a  text  file  of  commands,  often 
called  a  “script”. 

If  the  interpreter  (Python  in  our  case)  gets  its  input  from  the  user  then  we  say  it  is 
running  “interactively”.  If  it  gets  its  input  from  a  file  we  say  it  is  running  in  “batch 
mode”.  We  tend  to  use  interactive  mode  for  simple  use  and  batch  for  anything 
complex. 
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To  launch  a  terminal  window  to  type  commands  into  launch  the  GNOME  Terminal 
application  from  the  menu  system: 

Applications  -»  Unix  Shell  -»  GNOME  Terminal 

In  the  Unix  command  line  interpreter  we  issue  the  command  to  launch  the  Python 
interpreter.  That  command  is  the  single  word,  “python”. 

In  these  notes  we  show  the  Unix  prompt,  the  hint  from  the  Unix  system  that  it  is  ready 
to  receive  commands,  as  a  single  dollar  character  ($).  On  PWF  Linux  the  prompt  is 
actually  that  character  preceded  by  some  other  information. 

Our  other  convention  in  these  notes  is  to  indicate  with  the  use  of  bold  face  the  text  that 
you  have  to  type  while  regular  type  face  is  used  for  the  computer’s  output. 
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At  the  Unix  command  line  interpreter  we  issue  the  command  to  launch  the  Python 
interpreter.  That  command  is  the  single  word,  “python”. 

In  these  notes  we  show  the  Unix  prompt,  the  hint  from  the  Unix  system  that  it  is  ready 
to  receive  commands,  as  a  single  dollar  character  ($).  On  PWF  Linux  the  prompt  is 
actually  that  character  preceded  by  some  other  information. 

Our  other  convention  in  these  notes  is  to  indicate  with  the  use  of  bold  face  the  text  that 
you  have  to  type  while  regular  type  face  is  used  for  the  computer’s  output. 

The  interactive  Python  interpreter  starts  by  printing  three  lines  of  introductory  blurb 
which  will  not  be  of  interest  to  us.  For  completeness  what  they  mean  is  this: 

1.  The  version  of  Python  this  is. 

2.  The  version  of  the  C  compiler  the  interpreter  was  compiled  with. 

3.  A  few  hints  as  to  useful  commands  to  run. 

After  this  preamble  though,  it  prints  a  Python  prompt.  This  consists  of  three  “greater 
than”  characters  (>»)  and  is  the  indication  that  the  Python  interpreter  is  ready  for  you 
to  type  some  Python  commands.  You  cannot  type  Unix  commands  at  the  prompt. 
(Well,  you  can  type  them  but  the  interpreter  won’t  understand  them.) 
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So  let’s  issue  our  first  Python  command.  There’s  a  tradition  in  computing  that  the  first 
program  developed  in  any  language  should  output  the  phrase  “Hello,  world!”  and  we 
see  no  reason  to  deviate  from  the  norm  here. 

The  Python  command  to  output  some  text  is  “print”.  This  command  needs  to  be 
followed  by  the  text  to  be  output.  The  information  that  is  passed  to  the  function  like  this 
is  called  its  “arguments”.  In  our  case  there  is  only  one  argument.  Arguments  are 
passed  in  brackets  to  group  them  together. 

(Actually,  in  Python  the  print  function  is  a  special  case  for  historical  reasons,  and 
doesn't  seed  the  brackets.  However,  this  special  exemption  is  scheduled  for  removal 
in  the  next  version  of  Python  so  we  encourage  you  to  get  in  the  habit  of  using  them 
from  the  start.) 

The  text,  “Hello,  world!”  is  surrounded  by  single  quotes  (')  to  indicate  that  it  should  be 
considered  as  text  by  Python  and  not  some  other  commands  or  Python  keywords. 

The  command  is  executed  and  the  text  “Hello,  world!”  is  produced.  The  print 
command  always  starts  a  new  line  after  outputting  its  text.  Note  that  the  quotes  were 
used  to  indicate  to  Python  that  their  contents  were  text  but  they  are  not  part  of  the  text 
itself  so  are  not  printed  out  as  part  of  the  print  command's  output. 

Once  the  command  is  complete  the  Python  interpreter  is  ready  for  another  command 
so  prompts  for  it  with  the  same  triple  chevron  (“greater  than”  sign)  marker,  “>»”. 

Note  that  everything  in  Python  is  case-sensitive:  you  have  to  give  the  print  command 
all  in  lower-case;  “PRINT”,  “pRiNt”,  etc.  won’t  work. 
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We  will  continue  in  our  use  of  this  interactive  python  session. 

We  issue  a  trivial  command: 

»>  print(3) 

and  Python  faithfully  prints  the  number 
3 

to  the  terminal. 

If,  however,  we  just  type  a  bare  number: 

»>  5 

then  Python  evaluates  whatever  it  has  been  given  and  also  outputs  the  result  of  that 
evaluation: 

5 

Then  Python  prompts  for  more  input. 

There  is  a  subtle  difference  in  the  two  behaviours.  In  the  first  case  we  explicitly  told 
Python  to  print  a  value.  In  the  second  we  gave  it  a  value  and  it  responds,  essentially 
saying  “yup,  that's  a  5”. 
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5  Python  evaluates  and  displays  a  5 

UCS  12 


We  can  take  this  further.  We  will  meet  numbers  shortly  but  note  for  now  that  the 
“evaluation”  need  not  always  be  trivial.  We  can  use  Python  to  evaluate  expressions. 
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The  difference  is  more  explicit  if  we  use  text  rather  than  numbers. 

In  the  first  case  we  use  the  quotes  to  mark  their  content  as  text.  When  we  ask  Python 
to  print  some  text  it  prints  just  the  text  itself  without  any  syntactic  markers.  So  the  print 
example  has  no  quotes  in  its  output. 

In  the  second  case  we  hand  this  text  object  to  Python  and  it  says  “yup,  this  ia  a  text 
object  containing  this  sequence  of  characters.  The  way  it  indicates  that  it  is  a  text 
object  is  by  enclosing  it  in  quotes.  It  uses  exactly  the  same  marker  as  we  did. 
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Now  that  we  know  how  to  get  into  Python  we  need  to  know  how  to  get  out  of  it  again. 
In  common  with  many  Unix  commands  that  read  input  from  the  keyboard,  the  program 
can  be  quit  by  indicating  “end  of  input”.  This  is  done  with  a  “  [Ctrl]  +  [D]”.  To  get 
this  hold  down  the  control  key  (typically  marked  “Ctrl”)  and  tap  the  “D”  key  once. 
Then  release  the  control  key. 

Be  careful  to  only  press  the  “D”  key  only  once.  The  [Ctrl]  +  [D]  key  combination, 
meaning  “end  of  input”  or  “end  of  file”,  also  means  this  to  the  underlying  Unix 
command  interpreter.  If  you  press  [Ctrl]  +  [D]  twice,  the  first  kills  off  Python 
returning  control  to  the  Unix  command  line  and  the  second  kills  that  off.  If  the  entire 
terminal  window  disappears  then  this  is  what  you  have  done  wrong.  Start  up  another 
window,  restart  Python  and  try  again. 

If  you  are  running  Python  interactively  on  a  non-Unix  platform  you  may  need  a 
different  key  combination.  If  you  type  “exit”  at  the  Python  prompt  it  will  tell  you  what 
you  need  to  do  on  the  current  platform.  On  PWF  Linux  you  get  this: 

»>  exit 

Use  exit()  or  Ctrl-D  (i.e.  EOF)  to  exit 

»> 

If  you  do  not  feel  comfortable  using  [Ctrl]  +  [D]  then  you  can  type  run  the  Python 
command  exit()  instead. 


Exercise 

1.  Launch  a  terminal  window. 

2.  Launch  Python. 

3.  Print  out  “Hello,  world!” 

4.  Run  these  Python  expressions  (one  per  line): 

(a)  42 

(b)  26+18 

(c)  26<18 

(d)  26>18 

5.  Exit  Python  (but  not  the  terminal  window). 

ucs  0  2  minutes  « 


Here's  a  quick  exercise.  It  shouldn't  take  you  too  long,  but  if  you  get  stuck  do  get  the 
demonstrator's  attention  and  ask. 

The  answers  to  4(a)  and  4(b)  should  come  as  no  surprise.  The  answers  to  4(c)  and 
4(d)  will  be  new  but  we  will  cover  them  later  in  this  course. 

If  you  accidentally  quit  your  terminal  window  as  well  as  your  Python  session  then  you 
need  more  practice  with  Control  characters.  Launch  another  terminal  window,  launch 
Python  in  it  and  have  another  go  at  exiting  cleanly. 

If  you  rush  through  this  exercise  and  are  left  with  2  minutes  30  seconds  of  thumb- 
twiddling  time  here  are  some  more  exercises: 

A.  Try  to  predict  what  each  of  these  interactive  Python  commands  will  result  in. 
Then  try  them  for  real.  Were  you  right? 

»>  99  -  100 

»>  123456789  +  987654322 
»>  99  >  100 

B.  The  first  of  these  commands  works.  The  second  gives  an  error.  Why  do  you 
think  it  fails?  (We  will  address  this  when  we  cover  text  properly  later.) 

>»  print  (' Dowling ' ) 

>»  print  (' 0 '  Connor ' ) 


Writing  Python  scripts 


Applications  -  Word  and  Text  Processing  -  gedit 
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Now  we  have  seen  Python  interactively  (though  in  a  very  limited  capacity)  we  should 
look  at  it  being  used  in  batch  mode:  on  files  of  Python  commands.  To  read  and  write 
these  files  we  will  use  a  simple  editor  in  this  course  called  “gedit”.  If  you  already 
know  a  different  Unix  plain  text  editor  you  are  welcome  to  use  it,  but  the  course  notes 
and  the  lecturer  will  use  gedit.  A  hand  out  is  provided  with  a  quick  guide  on  how  to 
use  it. 

To  launch  gedit  on  PWF  Linux  select 
Applications  -»  Word  and  Text  Processing  -»  gedit 
from  the  menus. 

Please  be  careful.  The  gedit  application  edits  plain  text  files.  Some  of  these  (and 
most  for  our  purposes)  will  be  Python  scripts,  but  it  has  nothing  to  do  with  Python 
itself.  It  is  just  a  text  editor. 
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So,  once  we  have  a  script  (as  we  can  see  in  gedit)  we  need  to  run  it.  We  do  this  in  the 
terminal  window  by  running  the  python  command  just  as  we  did  interactively  but  this 
time  we  add  the  name  of  the  script  file  we  want  it  to  run. 

$  python  hello. py 

Hello,  world! 

$ 

Please  keep  the  text  editor  and  the  terminal  window  separate  in  your  mind. 


Launching  Python 
scripts 

Unix  prompt 
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Note  that  Python  runs  the  command  inside  the  file  just  as  if  it  had  been  typed 
interactively.  The  only  difference  is  that  this  time  Python  does  not  print  the  three  lines 
of  introductory  blurb  and  exits  automatically  once  the  script  is  complete.  We  go 
straight  back  to  the  Unix  prompt;  we  do  not  need  to  quit  from  Python  ourselves. 


Launching  Python 
scripts 

$  python  three. py 

3  No  “5” ! 
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We  will  use  this  representation  of  file  contents  rather  than  screenshots  in  future  slides. 

There  is  another  difference  between  interactive  and  batch  mode  which  we  can  see 
with  the  script  three  .  py. 

>»  python  three. py 

3 

Not  only  does  batch  mode  drop  the  introductory  blurb  but  it  also  drops  the  output  of 
values.  Unless  there  is  an  explicit  output  command,  Python  in  batch  mode  is  silent. 
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Those  are  the  only  two  differences  between  interactive  and  batch  mode  Python. 
Apart  from  that,  it's  just  a  case  of  what's  more  convenient. 


Progress 

What  Python  is 

Who  uses  Python 

How  to  run  Python  interactively 

How  to  run  a  Python  script 

ucs 

21 

Exercise 

1.  Launch  a  terminal  window. 

2.  Run  hello .  py  as  a  script. 

3.  Edit  hello .  py. 

Change  “Hello”  to  “Goodbye”. 

4.  Run  it  again. 


©  2  minutes  22 


Here's  an  exercise  to  make  sure  you  can  run  scripts  and  also  edit  them. 


Types 

of  values 

Numbers 

Whole  numbers 

Decimal  numbers 

Text 

“Boolean” 

True 

False 

ucs 
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We  are  going  to  start  by  using  Python  as  a  glorified  calculator.  To  do  that  we  need  to 
know  a  bit  about  the  sorts  of  things  we  will  be  calculating  with.  We  need  to  know  a 
little  about  how  Python  handles  its  various  values. 

In  computing  values  get  divided  up  into  “types”.  So  The  number  3  is  not  the  same  as 
the  letter  “3”.  These  have  different  types. 

We  will  start  by  looking  at  just  a  few  types.  These  will  be  plenty  to  get  us  a  long  way. 

We  will  look  at  numbers,  both  whole  numbers  and  decimal  numbers,  we  will  look  at 
text  and  we  will  look  at  so-called  “boolean”  values.  These  are  what  the  Python  system 
uses  to  record  “true”  and  “false”.  We  will  see  them  in  detail  shortly. 


Integers 


-2,  -1,  0, 

2,  3,  ...  } 


ucs 


We  will  start  with  the  integers,  i.e.  the  “whole  numbers”  (0,  the  positive  whole  numbers 
and  the  negative  whole  numbers)  =  {...,  -3,  -2,  -1,  0,  1,  2,  3,  ...}. 

The  letter  Z  (with  the  double  diagonal  stroke)  is  the  mathematical  symbol  for  the 
whole  numbers,  known  mathematically  as  the  “integers”. 


»>  4+2 

Addition  behaves  as 

6 

you  might  expect  it  to. 

»>  3  +  5 

ft 

Spaces  around 

the  “+”  are  ignored. 

ucs 


If  we  type  “4+2”  at  the  Python  prompt  it  is  evaluated  and  returned  as  “6”.  There’s  no 
great  surprise  there.  It  should  be  noted  that  Python  doesn’t  care  about  spaces  or  the 
lack  of  them  around  the  plus  sign,  or  before  or  after  the  integers  for  that  matter. 


»>  4-2 

Subtraction  also  behaves 

2 

»>  3  -  5 

as  you  might  expect  it  to. 

-2 

ucs 
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Subtraction  also  behaves  in  a  similar  fashion  with  negative  numbers  represented  with 
a  leading  minus  sign. 


»> 

4*2 

8 

»> 

15 

3*5 

Multiplication  uses  a 

instead  of  a  “x”. 

ucs 
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We  see  our  first  deviation  from  “obvious”  with  multiplication.  The  plus  and  minus  signs 
appear  on  the  standard  keyboard  so  can  be  used  by  programming  languages.  The 
times  sign,  “x”,  does  not  appear  on  the  keyboard  so  traditionally  in  computing  the 
asterisk,  “*",  is  used  instead.  (Actually  Linux  systems  with  UK  keyboards  can  get  “x” 
as  [1t]+[AltGr]+[,].) 


»> 

2 

4/2 

Division  uses  a 

instead  of  a 

»> 

5/3 

1 

Division  rounds  down. 

»> 

-5/3 

-2 

Strictly  down. 

ucs 
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Similarly,  division  uses  the  forward  slash  character,  “/”,  rather  than  “t”. 

Division  is  the  first  place  where  Python’s  integer  arithmetic  differs  from  conventional 
maths.  We  are  working  in  integers  and  Python  remains  within  integers  for  the  results 
too  so  if  the  division  would  give  a  fractional  answer  Python  rounds  down  to  give  an 
integer  value.  So  the  expression  “5/3”  gives  “1”  rather  than  “1  2/3”.  Note  that  the 
“round  down”  rules  is  applied  absolutely.  As  a  result  “-5/3”  is  evaluated  to  be  “-2” 
which  is  the  integer  below  “-1  2/3”.  So  (-5)/3  does  not  evaluate  to  the  same  as  -(5/3). 

This  sort  of  integer  division  is  also  known  as  “floor  division”. 

(Again,  “-r”  is  [ft  ]+[AltGr]+[.]  on  a  Linux  system  with  a  UK  keyboard,  if  you  are 
interested.) 


»> 

4**2 

16 

Raising  to  powers  uses 

“4**2”  instead  of  “42”. 

»> 

5  **  3 

125 

Spaces  around  the  “**” 

allowed,  but  not  within  it. 

ucs 

29 

The  next  mathematical  operator  we  will  describe  for  integers  is  raising  to  powers  (this 
is  known  as  “exponentiation”).  In  classical  arithmetic  notation  this  is  represented  by 
the  use  of  superscripts,  so  “4  to  the  power  of  2”  is  written  “42”.  However,  this  cannot 
be  represented  on  a  standard  keyboard  so  instead  a  different  notation  is  used.  We 
write  42  as  “4**2”.  You  are  permitted  spaces  around  the  “**”  but  not  inside  it,  i.e.  you 
cannot  separate  the  two  asterisks  with  spaces. 

Some  programming  languages  use  “A”  for  this  operator  rather  than  “**”.  Python, 
however,  uses  “**”  for  this,  and  uses  “A”  for  something  completely  different  that  will 
not  encounter  in  this  introductory  course. 


Remainder  uses  a 

»> 

4%2 

0 

4  =  2x2  +  0 

»> 

5  %  3 

2 

5  =  1x3  +  2 

»> 

-5  %  3 

1 

-5  =  -2x3  +  1 

Always  zero  or  positive 

ucs 
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There  is  one  integer  operator  used  in  computing  which  does  not  have  a  classical 
equivalent  symbol.  The  percent  character  is  used  to  to  determine  remainders.  “5%3” 
gives  the  answer  “2”  because  5  leaves  a  remainder  of  2  when  divided  by  3.  The 
remainder  is  always  zero  or  positive,  even  when  the  number  in  front  of  the  percent 
character  is  negative. 

We  won't  be  using  this  operator  in  the  course;  it  is  included  merely  for  completeness. 


How  far  can 

»>  2*2  .  ,  _ 

4  integers  go? 

»>  4*4 
16 

»>  16  *  16 

256 

»>  256  *  256 

65536  So  far,  so  good... 

UCS  31 


Python’s  integer  arithmetic  is  very  powerful  and  there  is  no  limit  (except  the  system’s 
memory  capacity)  to  the  size  of  integer  that  can  be  handled.  We  can  see  this  if  we 
start  with  2,  square  it,  get  and  answer  and  square  that,  and  so  on.  Everything  seems 
normal  up  to  65,536. 


»>  65536  *  65536 
4294967296L  Long  integer 

>»  4294967296  *  4294967296 

18446744073709551616 L 

>»  18446744073709551616  * 
18446744073709551616 

340282366920938463463374607431768211456 L 

No  limit  to  size  of 
Python's  integers!  32 


If  we  square  that  Python  gives  us  an  answer,  but  the  number  is  followed  by  the  letter 
“L”.  This  indicates  that  Python  has  moved  from  standard  integers  to  “long”  integers 
which  have  to  be  processed  differently  behind  the  scenes  but  which  are  just  standard 
integers  for  our  purposes.  Just  don’t  be  startled  by  the  appearance  of  the  trailing  “L”. 

We  can  keep  squaring,  limited  only  by  the  base  operating  system’s  memory.  Python 
itself  has  no  limit  to  the  size  of  integer  it  can  handle. 

Note:  If  you  are  using  a  system  with  a  64-bit  CPU  and  operating  system  then  the 
number  just  over  four  billion  also  comes  without  an  “L”  and  it  kicks  in  one  squaring 
later. 


int 

INTEGERS 

long 

INTEGERS 

long  long 
INTEGER*16 


Out  of  the  reach 

3402823669209384634... 

of  C  or  Fortran! 

63374607431768211456 

ucs 


18446744073709551616 


It  is  worth  mentioning  that  Python  is  quite  exceptional  in  this  regard.  C  and  Fortran 
have  strict  limits  on  the  size  of  integer  they  will  handle.  C++  and  Java  have  the  same 
limits  as  C  but  do  also  have  the  equivalent  of  Python’s  “long  integers”  as  well. 
However,  in  C++  and  Java  you  must  take  explicit  action  to  invoke  so-called  “big 
integers”;  they  are  not  engaged  automatically  or  transparently  as  they  are  in  Python. 

Recent  versions  of  C  have  a  “long  long”  integer  type  which  you  can  use  to  get 
values  as  large  as  18,446,744,073,709,551,615.  Square  it  one  more  time  and  Python 
can  still  beat  them. 


Progress 

Whole  numbers 

...-2,  -1,0,  1,2... 

No  support  for  fractions 

1/2 - -  0 

Unlimited  range  of  values 

Mathematical  operations 

Maths!  a+b  a-b 
UCS  Python:  a+b  a-b 

axb  a+b  ab  a  mod  b 

a*b  a/b  a**b  a%b  34 

Exercise 


In  Python,  calculate: 


1. 

12+4 

2. 

12+5 

3. 

12-4 

4. 

12-5 

5. 

12x4 

6. 

12x5 

7. 

12+4 

7. 

12+5 

9. 

124 

10. 

125 

Which  of  these  answers  is  “wrong”? 
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Here  are  some  simple  integer  sums  to  do  in  Python.  By  “wrong”  I  mean  that  the 
integer  answer  from  Python  does  not  equal  the  mathematical  non-integer  answer. 


Floating  point  numbers 

1  1.0 

1  %  1.25 

1  %  1.5 

ucs 


And  that  wraps  it  up  for  integers. 

Next  we  would  like  to  move  on  to  real  numbers,  i.e.  the  whole  numbers  and  all  the 
values  in  between,  so  that  we  can  cope  with  divisions  that  give  fractional  answers  and 
other  more  complex  mathematical  operations  that  need  more  than  the  integers. 

Python  implements  a  scheme  to  represent  real  numbers  called  “floating  point 
numbers”.  Some  non-integer  numbers  can  be  represented  exactly  in  this  scheme.  Two 
examples  are  VA  and  IV2.  Most  numbers  can't  be. 

Incidentally,  there  is  an  alternative  approximation  called  “fixed  point  numbers”  but 
most  programming  languages,  including  Python,  don’t  implement  that  so  we  won’t 
bother  with  it. 


But... 

1  y3  i 

XX 

L.3  ^ 

L.33 

L.333 

L.3333  ? 

ucs 
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But  what  about  an  equally  “simple”  fraction,  4/3?  In  normal  mathematical 
representation  we  express  this  approximately  as  a  decimal  expansion  to  a  certain 
number  of  places.  This  is  the  approach  computers  take,  typically  specifying  the 
number  of  decimal  places  they  will  work  to  in  advance. 

(R  is  the  mathematical  symbol  for  the  real  numbers.) 

If  you  are  going  to  be  doing  numerically  intensive  work  you  should  have  a  look  at  the 
article  “The  Perils  of  Floating  Point”  by  Bruce  M.  Bush,  available  on-line  at: 

http://www.lahey.com/float.htm 

This  article  will  tell  you  more  about  the  downright  weird  behaviour  of  floating  point 
numbers  and  the  kinds  of  problems  this  can  cause  in  your  programs.  Note,  however, 
that  all  the  examples  in  this  article  are  in  Fortran,  but  everything  the  article  discusses 
is  as  relevant  to  Python  as  it  is  to  Fortran. 


»> 

1.0 

1.0 

1  is  OK 

»> 

0.5 

0.5 

y2  is  OK 

Powers 

of  two. 

»> 

0.25 

0.25 

Va  is  OK 

»> 

0.1 

0.1 

1/10  is  not! 

Why? 

ucs 
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We  represent  floating  point  numbers  by  including  a  decimal  point  in  the  notation.  “1.0” 
is  the  floating  point  number  “one  point  zero”  and  is  quite  different  from  the  integer  “1”. 
(We  can  specify  this  to  Python  as  “1.”  instead  of  “1.0”  if  we  wish.) 

The  floating  point  system  can  cope  with  moderate  integer  values  like  1-0,  2-0  and  so 
on,  but  has  a  harder  time  with  simple  fractions. 


0 . 1  1/10  is  stored 

inaccurately. 

0.1  +  0.1  +  0.1 

0 . 30000000000000004 

Floating  point  numbers  are... 

...printed  in  decimal 
...stored  in  binary 

17  significant  figures 

ucs 


»> 

0.1 

»> 


Even  with  simple  numbers  like  this,  though,  there  is  a  catch.  We  use  “base  ten” 
numbers  but  computers  work  internally  in  base  two.  So  fractions  that  are  powers  of 
two  (half,  quarter,  eighth,  etc.)  can  all  be  handled  exactly  correctly.  Fractions  that 
aren’t,  like  a  tenth  for  example,  are  approximated  internally.  We  see  a  tenth  (0-1)  as 
simpler  than  a  third  (0-333333333...)  only  because  we  write  in  base  ten.  In  base  two  a 
tenth  is  the  infinitely  repeating  fraction  0-00011001100110011...  Since  the  computer 
can  only  store  a  finite  number  of  digits,  numbers  such  as  a  tenth  can  only  be  stored 
approximately.  So  whereas  in  base  ten,  we  can  exactly  represent  fractions  such  as  a 
half,  a  fifth,  a  tenth  and  so  on,  with  computers  it’s  only  fractions  like  a  half,  a  quarter, 
an  eighth,  etc.  that  have  the  privileged  status  of  being  represented  exactly. 

In  practice  we  get  sixteen  significant  figures  of  accuracy  in  our  floating  point  numbers. 
We’re  going  to  ignore  this  issue  in  this  introductory  course  and  will  pretend  that 
numbers  are  stored  internally  the  same  way  we  see  them  as  a  user. 

Note  for  completeness:  The  number  of  significant  figures  of  accuracy  to  which  Python 
stores  floating  point  numbers  depends  on  the  precision  of  the  double  type  of  the 
underlying  C  compiler  that  was  used  to  compile  the  Python  interpreter.  (If  you  have  no 
idea  what  that  statement  meant,  don’t  worry  about  it;  you  don’t  really  need  to  know  this 
level  of  detail  about  Python.)  What  this  does  mean  is  that  on  most  modern  PCs  you 
will  get  at  least  17  significant  figures  of  accuracy,  but  the  exact  precision  may  vary. 
Python  does  not  provide  any  way  for  the  user  to  find  out  the  exact  range  and  precision 
of  floating  point  values  on  their  machine. 


»>  0.1  +  0.1  +  0.1 

0 . 30000000000000004 


If  you  are  relying  on  the 
17th  decimal  place  you 
are  doing  it  wrong! 


ucs 


This  many  significant  figures  isn't  so  terrible.  If  you  are  relying  on  the  seventeenth  then 
you  are  sunk  anyway. 


Same  basic  operations 


»> 

5.0  +  2.0 

>»  5.0  *  2.0 

7.0 

10.0 

»> 

5.0  -  2.0 

»>  5.0  /  2.0 

00 

© 

2 . 5  Gets  it  right! 

»> 

5.0  %  2.0 

»>  5.0  **  2.0 

1.0 

25.0 

ucs 
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Let’s  stick  with  simple  floating  point  numbers  for  the  time  being.  It  won’t  take  long  to 
get  in  trouble  again.  The  basic  operations  behave  well  enough  and  use  exactly  the 
same  symbols  as  are  used  for  whole  numbers. 

Note  that  this  time  the  division  of  5-0  by  2-0  gives  the  right  answer,  2-5.  There  is  no 
truncation  to  whole  numbers. 


»>  4.0  *  4.0 

How  far  can 

16.0 

floating  point 

»>  16.0  *  16.0 

numbers  go? 

256.0 

»>  256.0  *  256.0 

65536.0 

»>  65536.0  *  65536 

0 

4294967296.0 

ucs 

So  far,  so  good...  42 

If  we  repeat  the  successive  squaring  trick  that  we  applied  to  the  integers  everything 
seems  fine  up  to  just  over  4  billion. 


>»  4294967296.0  **  2 

1 . 8446744073709552 e+19 

17  significant  figures  xio19 
1.8446744073709552xl019  = 

Approximate  answer  18,446,744,073,709,552,000 

4294967296  x  4294967296  = 

Exact  answer  18,446,744,073,709,551,616 

Difference 

ucs 


384 


If  we  square  it  again  we  get  an  unexpected  result.  The  answer  is  printed  as 

1.8446744073709552e+19 

This  means  1-8446744073709552*1019. 

First  note  the  notation  used.  Python  uses  the  notation  e+19  to  mean  xlO19  at  the  end 
of  a  number.  This  representation  is  known  as  “exponential”  or  “scientific”  form.  We’ve 
been  dumped  into  it  because  we  have  reached  the  limits  of  accuracy  that  17 
significant  figures  can  offer. 

Second,  note  that  this  is  not  the  right  answer.  There  is  an  error  in  the  value,  albeit 
small  relative  to  the  size  of  the  number. 

Positive  floating  point  numbers  can  be  thought  of  as  a  number  between  1  and  10 
multiplied  by  a  power  of  10  where  the  number  between  1  and  10  is  stored  to  17 
significant  figures  of  precision.  So  if  you  are  doing  mathematics  with  values  that  ought 
to  be  integers  you  should  stick  to  the  integers,  not  the  floating  point  numbers. 


>»  4294967296.0  *  4294967296.0 

1 . 844674407 3709552 e+19 


>»  1 . 8446744073709552e+19  * 

1 . 8446744073709552e+19 

3 . 4028236692093846e+38 

>»  3 . 4028236692093846e+38  * 

3 . 4028236692093846e+38 

1 . 157920892373162e+77 

>»  1 . 157920892373162e+77  * 

1 . 157920892373162e+77 

1 . 3407807929942597e+154 

ucs  44 


Now  that  we’re  in  exponential  notation  can  we  continue  the  squaring  further?  At  first 
glance,  yes  we  can. 


“Overflow  errors” 

»> 

1 . 3407807929942597e+154  * 

1 . 3407807929942597e+154 

inf 

Floating  point  infinity 

ucs 
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But  no.  Even  in  this  form,  floating  point  arithmetic  has  its  limits.  If  we  square  beyond 
approximately  lO300  we  get  an  “infinite”  answer.  Floating  point  systems  have  a  special 
code  for  “number  too  big  to  fit”  which  they  casually  describe  as  “infinity”.  Python  prints 
this  out  as  the  three  letters  “inf”. 


Floating  point  limits 

1.2345678901234567  x 10N 


17  significant  figures 

-325  <  N  <  308 

Positive  values: 

4.94065645841e-324  <  x  <  8.98846567431e+307 

ucs  46 


So  floating  point  numbers,  while  they  can  handle  fractions  (unlike  integers)  have  limits. 
They  are  limited  in  accuracy  and  range.  On  the  typical  PC  we  get  seventeen 
significant  figures  and  scales  between  10"324  and  10308. 


Progress 

Floating  Point  numbers 

1.25 - ►  1.25 

1.25xl05 - ►  1 . 25e5 

Limited  accuracy 

(but  typically 

Limited  range  of  sizes 

good  enough) 

Mathematical  operations 

a+b  a-b  axb  a+b  ab 

a+b  a-b  a*b  a/b  a**b 

ucs 
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Exercise 

In  Python,  calculate: 

1.  12-0+4.0 

2.  12-0-4-0 

3.  12-0+4.0 

4.  12+40-0 

5.  25 -0°'5 

6.  5-010 

7.  1-OxlO20  +  2-OxlO10 

8.  l-5xl020  +  1-0 

Which  of  these  answers  is 

“wrong”? 

ucs 

©  3  minutes  48 

In  this  case  “wrong”  means  not  precisely  correct. 


Strings 


“The  cat  sat  on  the  mat.” 


“Lorem  ipsum  dolor  sit  amet,  consectetuer  adipiscing  elit.  D 
onec  at  purus  sed  magna  aliquet  dignissim.  In  rutrum  libero 
non  turpis.  Fusee  tempor,  nulla  sit  amet  pellentesque  feugi 
at,  nibh  quam  dapibus  dui,  sit  amet  ultrices  enim  odio  nec  i 
psum.  Etiam  luctus  purus  vehicula  erat.  Duis  tortor  lorem,  c 
ommodo  eu,  sodales  a,  semper  id,  diam.  Praesent ..." 

ucs 


Finally  in  this  review  of  Python  types  we  will  look  at  text. 

Python  stores  text  as  “strings  of  characters”,  referred  to  as  “strings”. 

ps:  See  http :  //www .  lipsum .  com/  for  the  history  of  the  “lorem  ipsum”  typesetting 

test  text. 


Quotes 

The  value  of 

the  text  object 

»>  'Hello,  world!  ' 

Quotes:  Hey, 

this  is  text! 

' Hello,  world ! ' 

»> 

How  Python 
represents  the 
text  object. 

ucs 
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Simple  text  can  be  represented  as  that  text  surrounded  by  either  single  quotes  or 
double  quotes.  Here  we  use  single  quotes. 

Again,  because  of  the  historical  nature  of  keyboards,  computing  tends  not  to 
distinguish  opening  and  closing  quotes.  The  same  single  quote  character,  ',  is  used 
for  the  start  of  the  string  as  for  the  end. 

The  quotes  are  not  part  of  the  text;  they  simply  indicate  that  the  lump  of  text  should  be 
interpreted  by  Python  as  a  text  object. 

If  we  type  a  string  into  interactive  Python  then  it  responds  as  usual  with  that  value. 
Note  that  Python  uses  the  same  single  quotes  notation  to  indicate  that  this  is  a  text 
object. 


Why  do  we  need  quotes? 

3 - ►It’s  a  number 


print 


Is  it  a  command? 
Is  it  a  string? 


'print' - 

- ►It’s  a  string 

print - 

- ►It’s  a  command 
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Up  till  now,  we  have  seen  no  difference  between  a  raw  value  and  a  printed  value. 
Integers  and  floating  point  number  look  the  same  either  way.  This  is  because  Python 
doesn’t  need  any  syntactic  assistance  to  recognise  integers  or  floating  point  numbers. 
It  does  need  help  with  text,  though.  A  string  of  characters  like  “print”  might  be  either 
the  literal  string  to  be  evaluated  and  returned  just  like  a  number  or  a  command  to  be 
run. 

With  quotes  it  is  a  literal  string. 

Without  quotes  it  is  something  that  Python  will  process,  such  as  a  command. 


Hytnon  command 

“This  is  text.” 

The  text. 

»>  print (' Hello,  world!' 

) 

Hello,  world! 

»> 
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print  only 
outputs  the 
value  of 
the  text 
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The  print  function  outputs  the  raw  text,  without  any  surrounding  quotes. 


Double  quotes 

»>  "Hello,  world!" 

Quotes:  Hey, 

' Hello,  world ! ' 

this  is  text! 

»> 

Single  quotes 
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We  can  also  use  double  quotes  around  the  text.  It  makes  no  difference  at  all  to  the 
text  object  created.  Again  because  of  limitations  on  traditional  keyboards  we  use  the 
same  double  quote  character  at  the  end  as  the  start  of  the  string. 

One  of  the  effects  of  it  making  no  difference  is  that  if  we  input  a  string  with  double 
quotes  Python  may  well  show  it  with  single  quotes.  This  is  how  Python  represents 
strings.  It  has  no  memory  of  what  quotes  were  used  to  input  it  in  the  first  place. 


Single  Double 

quotes  quotes 

'Hello,  world!'  "Hello,  world!" 

Both  define  the 
same  text  object. 
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The  only  condition  on  using  single  or  double  quotes  is  that  you  must  use  the  same  at 
either  end  of  the  string.  You  cannot  start  with  one  and  end  with  the  other. 


Mixed  quotes 

»>  print  'He  said  "Hello"  to  her.' 

He  said  "Hello"  to  her. 

»>  print  "He  said  'Hello'  to  her." 

He  said  'Hello'  to  her. 
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The  flexibility  of  using  either  single  or  double  quotes  to  identify  text  to  the  Python 
interpreter  is  that  we  have  an  easy  way  to  create  text  objects  that  have  quotes  in 
them.  If  you  want  a  text  object  with  double  quotes  in  it  then  define  it  with  single  quotes 
around  it.  If  you  want  one  with  single  quotes  in  it  define  it  with  double  quotes  around  it. 


Joining  strings  together 

»>  'He  said'  +  'something.  ' 

'He  saidsomething . ' 

»>  'He  said  '  +  'something.' 

'He  said  something.' 
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Python  has  various  facilities  for  manipulating  strings  of  characters.  We  will  see  two  at 
this  point.  Strings  can  be  joined  together  with  the  “+"  operator.  Note  that  no  spaces 
are  added  as  strings  are  joined. 


Repeated  text 

»>  '  Bang ! 

'  *  3 

'Bang!  Bang 

Bang!  ' 

»>  3  * 

1  Bang !  1 

'Bang!  Bang 

Bang !  ' 
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We  can  also  repeat  a  string  by  “multiplying  it  by  a  number”. 
Note  that  both  "Bang  !  "  *  3  and  3  *  "Bang!  "  are  valid. 


Progress 

Strings 

Use  quotes  to  identify  (matching  single  or  double) 
Use  print  to  output  just  the  value 
String  operations 
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Exercise 

Predict  what  interactive  Python  will  print  when  you 
type  the  following  expressions.  Then  check. 

1.  'Hello,  '  +  "world!" 

2.  'Hello!'  *  3 

3.  ""  *  10000000000 

4.  '4'  +  '2' 

(That's  two  adjacent 
double  quote  signs.) 
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Feel  free  to  write  your  predictions  on  the  notes;  it  helps  stop  you  cheating  with 
yourself.  If  you  can't  understand  why  you  get  any  of  the  answers,  ask. 


Line  breaks 


Problem:  Suppose  we  want  to  create  a 
string  that  spans  several  lines. 


»>  print  ( '  Hello, 
world ! ' ) 


»> 


print  ( '  Hello, 


SyntaxError:  EOL 
string  literal 
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while  scanning 
“end  of  line” 
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So  far  we  have  looked  at  simple,  short  strings.  Suppose  we  wanted  some  text  that 
was  long  enough  to  require  line  breaks,  or  a  short  piece  of  text  where  we  wanted  to 
include  some  line  breaks  for  formatting  reasons. 

We  hit  a  problem.  If  we  try  to  create  a  string  the  way  we  have  been  doing  so  far  the 
Python  system  throws  an  error  when  we  hit  the  [«-■]  key. 


'he  line  break  character 


Solution:  Provide  some  other  way  to  mark 

“line  break  goes  here”. 


»>  print ( 'Hello, \nworld!  ' ) 

Hello, 
world ! 
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\n - ►new  line 
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If  we  can't  press  [<-“]  to  signal  “line  break  goes  here”  we  need  some  other  way  to  do  it. 
Python  uses  a  common  convention  (originating  in  the  C  programming  language)  that 
the  pair  of  characters  “\n”  represents  the  “new  line  character”. 

The  first  character  is  called  a  “backslash”.  Note  that  it  is  not  the  same  as  the  forward 
slash,  7”,  which  Python  uses  for  arithmetic  division. 

On  most  modern  operating  systems  line  breaks  are  recorded  in  the  data  as  an  explicit 
character  or  set  of  characters.  They  don't  agree  on  what  the  characters  should  be,  but 
“\n”  is  what  our  platforms  use. 


The  line  break  character 

' Hello, Xn^orld ! ' 


H 

[±. 

o 

i — i 

i — i 

T1 

w  |  o  |  r 

' 

72  |101 

108|108|111|  44 

O 

“ 

CD 

O 

00 

O 

CD 

o 

O 

o 

33 

A  single  character 
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Note  that  “\n”  is  just  a  way  to  represent  the  new  line  character.  There  are  not  two 
characters  there;  there's  only  one. 

Internally  characters  are  represented  as  numbers,  and  the  new  line  character  has  a 
number  just  like  each  of  the  letters. 


Special  characters 
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“New  line”  is  not  the  only  special  character  like  this. 

The  machines  in  our  public  classrooms  have  had  their  speakers  disabled  so  you  can't 
heart  the  beep  from  “\a”  (“alarm”).  The  sequence  “\t”  gives  the  tab  character. 

The  backslash  can  also  be  used  to  introduce  ordinary  characters  where  they  would 
otherwise  have  special  meaning.  We  can  use  it  to  introduce  quote  marks  without 
worrying  about  the  quotes  around  the  string,  for  example. 

Also,  we  have  to  backslash  the  backslash  character  if  we  want  it  in  a  string. 

For  interested  readers  only: 

There  are  more  white  space  characters  than  new  line  and  tab,  by  the  way.  Python 
supports  these  less  commonly  needed  sequences  too: 


\a 

bell/alarm 

print ( 

' beep\a  beep\a' ) 

\b 

Backspace 

print ( 

' abc\bdef ' ) 

\e 

[Esc] 

\f 

Form  feed 

print ( 

' abc\fdef 1 ) 

\n 

New  line/Line  feed 

print ( 

' abc\ndef ' ) 

\r 

Carriage  return 

print ( 

' abc\rdef 1 ) 

\t 

Horizontal  tab 

print ( 

' abc\tdef 1 ) 

print ( 

' ab\tcdef\nabc\tdef\nabcd\tef 1 ) 

\v 

Vertical  tab 

print ( 

' abc\vdef 1 ) 

Many  of  these  hark  back  to  the  days  of  teletype  printers. 

Be  careful  with  [Esc],  It  can  be  used  to  send  instructions  to  your  terminal,  rendering  it 
potentially  unusable  until  reset. 


“Long”  strings 

»>  '  "  Hello,  «P 

world ! 1 1 1 

Three  single  quote  signs 

' Hello, \nworld  !  ' 

»> 

An  ordinary  string 

An  embedded 
new  line  character 
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But  this  is  fiddly.  We  want  to  be  able  to  just  hit  the  [**]  key. 

Python  has  some  special  support  for  “long  strings”  where  line  breaks  are  likely  to  be 
required.  If  we  start  a  literal  string  with  three  single  quotes  then  we  can  just  hit  the  [**] 
key  the  way  we  would  like  to.  This  strings  can  span  as  many  lines  as  we  want  and 
closes  with  a  matching  triplet  of  quotes. 

Note  that  the  string  that  is  created  with  way  is  just  another  string.  The  triple  quotes 
procedure  is  just  a  trick  to  enter  long  strings  more  easily.  It  doesn't  create  a  new  type 
of  string. 


What  the  string  is  vs. 
how  the  string  prints 

' Hello, \nworld ! '  Hello, 

world ! 

It's  not  just  quotes  vs.  no  quotes! 

ucs 


The  new  line  character  emphasizes  the  difference  between  the  way  Python  represents 
an  object  (e.g.  a  string  with  its  quotes  and  special  characters  shown  in  strange  ways) 
and  the  way  it  prints  that  object  (which  interprets  those  special  characters). 


Single  or  double  quotes 

»>  . Hello,  «P 

world ! """ 

'  Hello, \nworld ! ' 

»> 

Three  single  quote  signs 

The  same  string 
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Note  that  for  the  long  string  trick  we  can  use  a  triplet  of  either  single  or  double  quotes, 
but  they  must  match  at  the  two  ends. 


Long  strings 


'''Lorem  ipsum  dolor  sit  amet,  consectetuer 
adipiscing  elit .  Donee  at  purus  sed  magna  aliquet 
dignissim.  In  rutrum  libero  non  turpis.  Fusee 
tempor,  nulla  sit  amet  pellentesque  feugi  at,  nibh 
quam  dapibus  dui,  sit  amet  ultrices  enim  odio  nee 
ipsum.  Etiam  luctus  purus  vehicula  erat.  Duis 
tortor  lorem,  commodo  eu,  sodales  a,  semper  id, 
diam. ' ' ' 
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There  is  no  limit  to  how  long  a  long  form  literal  string  can  be. 


Progress 

Entering  arbitrarily  long  strings 

Triple  quotes 

Dealing  with  line  breaks 

ii  ii  ii  ii  ii  ii 

... 

iii  iii 

Other  “special”  characters 

\n  \t  ... 
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Exercise 

Predict  the  results  of  the  following  instructions. 
Then  check. 

1.  print (' Goodbye,  world!') 

2.  print (' Goodbye, \nworld !' ) 

3.  print (' Goodbye, \tworld !' ) 
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Comparisons 

Are  two  values  the  same? 


Is  one  bigger  than  the  other? 
Is  “bigger”  even  meaningful? 


ucs 


5+2  - - -  7 


5+2  - - -  8 


Now  we  have  values  we  can  start  comparing  them.  We  can  ask  if  two  values  are  the 
same,  obviously  but  we  can  also  ask  if  one  is  bigger  than  the  other.  For  numbers  this 
makes  obvious  sense  but  for  other  sorts  of  values  it  might  make  none  at  all. 


Comparisons 

A  comparison  operation 

»>  5  >  4 

True  A  comparison  result 

»>  5.0  <  4.0 

False  Only  two  values  possible 
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For  numerical  comparisons  we  can  use  the  symbols  provided  on  the  keyboard.  If  we 
type  a  comparison  at  the  interactive  Python  prompt  we  are  told  whether  or  not  the 
comparison  is  correct.  (We  will  return  to  “True”  and  “False”  soon.) 

Note  that  we  can  compare  whole  numbers  and  floating  point  numbers. 


Equality  comparison 

n.b.  double  equals 

»>  5  ==  4 

False 


ucs 


Perhaps  the  most  important  comparison  is  to  test  for  whether  two  values  are  equal. 
The  operator  to  do  this  is  a  double  equals  sign.  The  single  equals  sign  is  used  for 
something  else  and  we  will  meet  it  shortly,  but  for  comparisons  two  values  we  use  a 
double  equals  sign. 


Useful  comparisons 

»>  (2**2) **2  ==  2** (2**2) 

T  rue 

»>  (3**3) **3  ==  3** (3**3) 

False 
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Comparing  4  and  5  interactively  is  hardly  useful  though,  so  here’s  one  you  may  have 
to  think  about. 


All  numerical  comparisons 


Python 

Mathematics 

X 

==  y 

X 

— 

y 

X 

!=  y 

X 

£ 

y 

X 

<  y 

X 

< 

y 

X 

A 

II 

X 

< 

y 

X 

>  y 

X 

> 

y 

X 
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II 

A 

X 

> 

y 

There  are  six  numerical  comparisons  in  total.  The  “strictly  less  than”  and  “strictly 
greater  than”  comparisons  simply  use  their  symbols  on  the  keyboard  ([Shift]+[,]  for  [<] 
and  [Shift]+[.]  for  [>]  on  the  keyboards  you  are  most  likely  to  use).  The  other 
comparisons  use  double  characters  (which  must  not  be  split  by  spaces). 


Comparing  strings 

»>  'cat'  <  'mat' 

True 

»>  'bad'  <  'bud'  Alphabetic  order. 
True 

»>  'cat'  <  'cathode' 

True 

ucs 
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When  we  compare  numbers  there  is  an  obvious  “right”  answer.  When  we  compare 
strings  we  use  alphabetical  order. 


Comparing  strings 

>»  'Cat'  <  'cat' 

True 

»>  'Fat'  <  'cat' 

True 

ABCDEFGHIJKLMNOPQRSTUVWXYZ... 
abcdefg  h  ij  kl  m  no  pq  rstu  vwxyz 
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But  what  about  mixed  case  words?  Python  orders  all  the  upper  case  letters  in  front  of 
all  the  lower  case  letters. 


Progress 

Six  comparisons: 

==  j  =  <  <=  >  >= 

=  *  <  <  >  > 

Numbers: 
numerical  order 

H — 1 — 1 — 1 — 1 — 1 — h- 

-3-2-10  1  2  3 

Strings: 

alphabetical  order 

ABCDEFGHIJKLMNOPQRSTUVWXYZ... 

abcdefghijklmnopqrstuvwxyz 
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Exercise 

Predict  whether  Python  will  print  True  or  False 
when  you  type  the  following  expressions. 

Then  check. 

1.  100  <  100 

2.  3*45  <=  34*5 

3.  'One'  <  'Zero' 

4.  1  <  2.0 

5.  0  <  1/10 

6.  0.0  <  1.0/10.0 
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Truth  and  Falsehood 

True  and  False 
“Boolean”  values 

Same  status  as  numbers,  strings,  etc. 

5  +  4  - ►  9  Whole  number 

5  >  4  - ►  True  Boolean 
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We  have  seen  interactive  Python  respond  “True”  and  “False”  to  our  comparison 
enquiries.  These  are  not  just  remarks  from  Python  but  true  values.  They  are  values  of 
a  type  called  “Boolean”  which  can  only  take  two  values:  True  and  False.  This  new  type 
has  the  same  status  in  Python  as  integers,  floating  point  numbers,  strings  etc. 

Just  as  the  “plus”  operator  takes  two  integers  and  gives  an  integer,  the  “greater  than” 
operator  takes  two  integers  and  returns  a  Boolean. 


Combining 

True 

booleans 

True 

»>  1  <  2  and 

5  <  6 

True 

Both  T rue 

True 

False 

»>  1  <  2  and 

5  >  6 

False 

Not  both  True 
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Now  that  we  have  booleans  as  values  we  can  manipulate  them.  Just  as  there  are 
operators  that  combine  integers  to  create  integers  (1  +  1  gives  2,  etc.)  there  are 
operators  that  combine  booleans  to  give  booleans. 

The  first  we  will  meet  is  “and”.  This  takes  two  booleans  and  if  both  of  them  are  True 
gives  T rue  as  a  result.  If  either  or  both  of  them  is  False  then  it  gives  False. 


Combining 

True 

booleans 

True 

»>  1  <  2  or 

5  <  6 

True 

Either  True 

True 

False 

»>  1  <  2  or 

5  >  6 

True 

Either  True 
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Similar  to  “and”  is  “or”.  This  returns  a  T rue  if  either  or  both  of  the  given  values  is 
True. 


Combining 

booleans 

False 

False 

»>  1  >  2  or 

5  >  6 

False 

Neither  True 
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The  “or”  operator  only  returns  False  when  both  its  arguments  are  False. 


Negating  booleans 


»>  1  >  2 
False 


True 

*  False 

False  - 

*  True 

»>  not 

1  >  2 

True 

»>  not 

False 

True 
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There  is  one  other  boolean  operator  we  need  to  know  about.  The  “not”  operator 
inverts  a  boolean  value.  It  turns  True  into  False  and  vice  versa. 


Not  equal  to... 

»>  1  ==  2 

False 

»>  1  !=  2 

True 

>>>  not  1  ==  2 

True 
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Note  that  the  “not”  operator  gives  us  two  ways  to  test  for  whether  two  values  are 
unequal. 


Progress 

“Booleans” 

True 

False 

Combination 

and 

or 

Negation 

not 
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Exercise 

Predict  whether  Python  will  print  True  or  False 
when  you  type  the  following  expressions. 

Then  check. 

1.  1  >  2  or  2  >  1 

2.  1  >  2  or  not  2  >  1 

3.  not  True 

4.  l>2orTrue 
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Before  we  finish  with  all  this  value  juggling  there  is  one  last  thing  to  address.  More 
complex  expressions  that  involve  more  than  one  operator  need  to  have  some  rules  for 
which  operator  is  dealt  with  first. 

For  example  there  are  two  possible  interpretations  for  “12+8/4”. 


Standard  interpretation 

12  +  8/4 

12  +  (8/4) 

(12  +  8)/4 

12  +  2 

20/4 

14 

5 
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Traditionally  (or  “as  human  beings”)  we  always  interpret  this  according  to  the  rules 
the  left  hand  side  of  the  slide,  but  for  a  computer  we  need  to  be  explicit.  We  do  the 
division  before  we  do  the  addition. 


Division  before  addition 

12  +  8/4 

Initial  expression 

12  +  8/4 

Do  division  first 

12  +  2 

12  +  2 

Do  addition  second 

14 
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Some  people  say  that  the  division  “binds  more  tightly”  than  addition.  I  prefer  to  say 
that  division  goes  first. 


Precedence 

Division  before  addition 

i 

An  order  of  execution 

i 

“Order  of  precedence” 
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So,  if  division  goes  before  addition,  then  we  have  an  idea  of  an  order  that  all  the 
operators  get  executed  in.  This  is  called  the  “order  of  precedence”. 


Precedence 

First 

*  *  0/ 

70 

/  *  -  + 

Arithmetic 

=  =  '  “ 

V 

II 

V 

A 

II 

A 

Comparison 

not  and 

or 

Logical 
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In  a  nutshell  this  is  it.  Exponentiation  goes  first,  followed  by  remainders,  followed  by 
division  etc. 

Mostly  this  just  does  “what  you  expect”. 


Parentheses 

Avoid  confusion! 

18/3*3 

“Check  the  precedence  rules” 

18/ (3*3) 

“Ah,  yes!” 
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However,  if  there  is  any  chance  of  confusion  you  should  use  parentheses  (round 
brackets).  Even  if  you’re  not  confused,  if  you  think  it  would  be  easier  for  your  reader  to 
understand  your  expression  with  brackets,  use  them. 


Exercise 

Predict  what  Python  will  print  when 
you  type  the  following  expressions. 
Then  check. 

1.  12/3*4 

2.  3  >  4  and  1  >  2  or  2  >  1 
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Exercise:  V2  by  “bisection” 


Now  we’ll  do  a  more  significant  example. 

This  is  the  start  of  a  build-up  to  a  real  Python  program.  Computer  programs  run  mind- 
numbingly  tedious  routines  very  quickly  (so  that  we  don't  have  to).  Unfortunately,  to 
understand  just  what  the  computer  is  going  to  be  doing,  we  need  to  understand  the 
mind-numbing  bit  too.  Sorry.  It  won't  last  too  long. 

We  are  going  to  get  a  (poor)  approximation  to  the  square  root  of  2,  that  is  the  positive 
number  that  when  multiplied  by  itself  gives  2.  We  will  use  a  method  called  “bisection” 
and  we  will  do  it  manually.  Later  we  will  learn  the  Python  to  automate  the  process. 
Bisection  works  by  starting  with  two  estimates  for  V2,  one  too  small  and  one  too  large. 
Each  stage  of  the  process  starts  by  calculating  the  mid-point  of  the  two  estimates  and 
seeing  if  it  is  too  big  or  too  small  itself  by  squaring  it  and  comparing  it  against  2.  If  it  is 
too  big  then  we  switch  our  attention  to  the  smaller  interval  running  from  the  old  “too 
small”  estimate  to  the  mid-point  which  is  our  new  “too  large”  estimate.  If  the  mid-point 
is  too  small  then  we  switch  attention  to  the  interval  running  from  the  mid-point,  which 
becomes  our  new  “too  small”  estimate  and  the  original  “too  large”  estimate. 

So,  each  step  of  the  process  reduces  the  size  of  the  interval  from  “too  small”  to  “too 
large”  by  a  factor  of  2.  This  converges  very  quickly  but  as  we  are  doing  it  manually  we 
will  only  do  five  steps  ourselves. 

So  this  slide  shows  the  initial  stage.  We  mark  with  a  red  bar  the  interval  between  our 
lower  and  upper  estimates  (1-0  and  2-0)  and  it’s  corresponding  range  of  squared 
values  (1-0  to  4-0).  We  start  with  this  interval  (that  contains  V 2)  having  length  1-0. 


Exercise:  V2  by  “bisection” 


We  find  the  mid-point  and  calculate  its  square. 


Exercise:  V2  by  “bisection” 


Next  we  ask  if  the  squared  value  is  greater  than  2-0  or  less  than  it.  It  is  greater  than 
2-0  so  we  reduce  the  upper  bound  to  this  mid-point.  (Otherwise  we  would  have  raised 
the  lower  bound.) 

The  interval  containing  V2  now  has  length  0-5. 


Now  we  repeat  the  process. 

We  find  the  new  mid-point  and  calculate  its  square. 


We  ask  if  the  mid-point  squared  is  greater  than  2-0.  This  time  it  isn’t  so  we  raise  the 
lower  bound  to  the  mid-point. 

The  interval  containing  V2  now  has  length  0-25. 


We  do  a  third  iteration.  We  find  the  mid-point  of  this  latest  interval  and  calculate  its 
square. 


We  ask  if  that  square  is  greater  than  2-0. 

It  isn’t  so  again  we  raise  the  lower  bound  to  the  mid-point. 

The  interval  containing  V 2  now  has  length  0-125.  The  uncertainty  over  the  value  of  V 2 
is  Vs  of  its  original  size. 


Exercise:  V2  by  “bisection” 

Three  more  iterations,  please. 
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We  have  been  using  Python  as  a  calculator  to  determine  mid-points,  squares  and 
whether  numbers  were  bigger  than  2-0.  To  check  out  your  understanding  of  python  we 
would  like  you  to  do  it  manually  three  more  times  (to  get  an  interval  of  size  0-015625). 


So  far . . . 


...using  Python 
as  a  calculator. 
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So  far  we  have  used  Python  as  a  calculator.  We  needed  to  do  that  to  get  used  to 
some  of  its  properties,  but  it’s  capable  of  so  much  more. 

(Picture  ©  Christian  "VisualBeo"  Horvat,  distributed  under  the  Creative  Commons 
Attribution  ShareAlike  3.0  licence. 

http://commons.wikimedia.Org/wiki/File:Calculator_casio.jpg) 


Now  ... 


...use  Python 
as  a  computer. 
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Now  we  are  going  to  start  using  it  as  a  real  computer  programming  language.  So  we 
need  to  get  a  little  computer-y. 

(Featured  computer:  a  PDP-12 

Picture  by  Bjarni  Juliusson,  who  placed  it  in  the  public  domain. 
http://commons.wikimedia.Org/wiki/File:PDP-12-Update-Uppsala.jpeg) 


How  Python  stores  values 

Lump  of  computer  memory 


Identification  of 
the  value’s  type 


Identification  of 
the  specific  value 
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We’ll  get  computer-y  by  looking  briefly  at  how  Python  stores  the  values  we’ve  been 
looking  at  in  system  memory. 

Python  stores  a  value  as  a  record  of  what  type  the  value  is  followed  by  the  data 
corresponding  to  the  specific  value.  The  computer  can’t  interpret  that  data  without 
knowing  what  type  of  value  it  is  representing. 


How  Python  stores  values 


42 

4-2X101 

'Forty  two' 

True 
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Just  for  interest,  not  all  programming  languages  do  this.  Other  require  the  program  to 
remember  what  type  a  lump  of  system  memory  contains.  Bugs  ensue  when  the 
programmer  gets  it  wrong  and  interprets  an  integer  as  a  floating  point  number  or  a 
string,  etc. 


Variables 

Attaching  a  name  to  a  value. 

»> 

40  +  2 

An  expression 

42 

The  expression’s  value 

»> 

answer  =  42 

Attaching  the  name 

answer  to  the  value  42. 

»> 

answer 

The  name  given 

42 

The  attached  value  returned 
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Now  let’s  get  really  computer-y.  We  are  going  to  start  attaching  names  to  our  values 
so  we  can  manipulate  them  within  our  programs. 

We  have  seen  that  if  we  enter  a  value  at  the  Python  prompt  Python  responds  with  that 
value.  If  we  type  in  an  expression  (e.g.  40+2)  then  Python  evaluates  it  and  replies  with 
the  expression’s  value  (42  in  this  case). 

Now  we  will  type  in  a  radically  different  expression.  We  type  in  “answer  =  42” 

(n.b.  single  equals  sign  and  no  quotes  around  the  word  answer).  Python  gives  no 
response. 

But  now  we  can  just  type  in  the  word  “answer”  (without  any  quotes)  and  Python 
evaluates  it  to  have  the  value  42  that  featured  in  the  previous  expression. 


Variables 

The  name  being  attached 

A  single  equals  sign 

The  value  being  named 

»>  answer  =  42 

No  quotes 
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Let’s  look  at  that  operation  more  closely. 

1.  We  start  with  the  name  that  is  going  to  be  attached  to  a  value.  Incidentally,  if  that 
name  was  previously  attached  to  a  different  value  then  it  gets  detached  from  that  one 
and  re-attached  to  this  new  value. 

2.  We  follow  the  name  with  a  single  equals  sign.  You  may  recall  that  when  we  met 
the  equality  comparison  operator  (the  double  equals  sign)  we  said  we  would  meed  the 
single  equals  sign  later.  This  is  that  moment. 

3.  Finally  we  put  the  value  we  want  the  name  attached  to. 

The  formal  name  for  this  operation  is  “assignment”.  The  name  is  assigned  the 
value  42. 


Equals  signs 

Comparison: 

“are  these  equal?” 

Assignment: 

“attach  the  name  on  the  left  to 
the  value  on  the  right” 

ucs 
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Just  to  emphasize: 

one  equals  sign  -»  assignment 

two  equals  sign  -»  comparison 


Order  of  activity 


int  42 

|variables|  ’ 

»>  answer  =  42 


1.  Right  hand  side  is 
evaluated. 

2.  Left  hand  side  specifies 

the  attached  name. 

answer 


We  typed  from  left  to  right.  The  computer  processes  the  instruction  the  other  way 
round,  though. 

1.  The  expression  on  the  right  hand  side  is  evaluated  to  give  the  value  that  will 
have  a  name  attached  to  it. 

2.  Once  the  value  is  determined  the  left  hand  side  is  interpreted  to  get  the  name  to 
attach.  (Later  we  will  meet  more  complicated  left  hand  sides  that  require  a  measure  of 
evaluation  themselves.) 


Example  —  1 

»>  answer  =  42 

»>  answer 

42 

»> 

Simple  value 

ucs 
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In  the  example  we  saw  the  right  hand  side  was  a  literal  value.  This  is  the  easiest  case 
where  the  evaluation  is  simply  “that’s  the  integer  42”. 


Example  —  2 

»>  answer  =44-2 

Calculated  value 

»>  answer 

42 

»> 

ucs 

in 

The  next  level  up  in  complexity  is  when  there  is  an  expression  on  the  right  hand  side 
that  requires  actual  evaluation.  The  expression  “44  -  2”  is  evaluated  to  a  value 
“integer  42”  and  after  that 


Example  —  3 


»> 

answer  =  42 

»> 

42 

answer 

“Old”  value 

Reattaching  the 

»> 

answer  =  answer  -  2 

name  to  a 

»> 

answer 

different  value. 

40 

“New”  value 

»> 
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But  we  can  go  further.  Because  the  right  hand  side  is  evaluated  completely  before  the 
left  hand  side  (the  name)  is  looked  at,  it  can  contain  names  itself,  including  the  name 
that  is  about  to  be  assigned  to! 

So,  suppose  we  attach  the  name  “answer”  to  the  value  “integer  42”.  We  can  then  use 
that  name  in  the  right  hand  side  of  a  following  expression. 


Example  —  3  in  detail 

answer  =  answer  -  2  r.h.S.  processed  1st 

answer  =  42  -  2  Old  value  used  in  R.H.S. 

answer  =  40  R.H.S.  evaluated 

answer  =  40  L.H.S.  processed  2nd 

answer  =  40  L.H.S.  name  attached 

to  value 


The  process  of  evaluating  the  right  hand  side  before  the  left  hand  side  is  rigorously 
enforced. 

1.  The  expression  “answer  -  2”  is  evaluated.  The  name  “answer”  appears  in  it  and 
is  evaluated  to  be  its  current  value,  “integer  42”.  So  the  right  hand  side  is  partially 
evaluated  to  be  “42  -  2”.  This  evaluation  is  then  completed  to  give  a  final  value  of 
“integer  40”. 

2.  Then  and  only  then  is  the  left  hand  side  looked  at.  This  contains  a  name, 
“answer”.  That  name  is  currently  attached  to  a  different  value  so  it  is  detached  from 
that  and  re-attached  to  its  new  value.  Where  this  value  came  from  is  not  relevant. 


Using  named  variables  —  1 


upper 

lower 


2.0 

1.0 
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Let’s  put  named  values  (“variables”)  to  work. 

We’ll  revisit  the  square  root  of  two  example  we  met  previously.  This  time,  instead  of 
copying  and  pasting  (or  retyping)  we’ll  attach  names  to  the  values. 

We  start,  as  before  with  initial  upper  and  lower  bounds.  This  time,  however,  we  will 
attach  names  to  them. 

»>  upper  =  2.0 
>»  lower  =  1.0 

»> 

The  names  we  pick  are  “upper”  and  “lower”.  It  is  always  a  good  idea  to  pick 
meaningful  names.  Avoid  the  algebraist’s  approach  of  calling  things  “x”  and  “y”. 


Using  named  variables 
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Next  we  calculate  the  mid-point.  Again  we  attach  a  name  to  the  value  and  use  the  two 
existing  names  to  calculate  it. 

>»  middle  =  (upper  +  lower )/2.0 
>»  middle 

1.5 

»> 

N.B.  The  first  instruction  is  all  one  line. 


u 

j-a 

ising  named  variables  —  3 

f  »>  middle**2  >  2.0 

True 

»>  upper  =  middle 

r 

1*  j 

□ 

r*  IP 

u< 

A  J 

DS 

4 

35 - 
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We  need  to  square  the  mid-point  value  and  compare  it  with  two  to  see  if  it  is  above 
(T rue)  or  below  (False)  the  square  root  of  two. 

>»  middle* *2  >  2.0 

True 

Because  it  is  above  the  exact  value  we  reduce  the  upper  bound  to  the  mid-point. 
Using  names  for  values  makes  this  easy.  We  simply  issue  instruction  to  attach  the 
name  “upper”  to  the  mid-point’s  value  which  currently  has  the  name  “middle” 
attached  to  it. 

Recall  that  there  is  no  problem  with  having  more  than  one  name  attached  to  a  value. 

>»  upper 

2.0 

>»  lower 

1.0 

>»  middle 

1.5 

>»  upper  =  middle 
>»  upper 

1.5 

>»  lower 

1.0 

>»  middle 

1.5 

What  matters  is  that  we  changed  the  value  upper  was  attached  to  rather  than  lower 
because  of  the  results  of  the  comparison. 


Using  named  variables  —  4 


IS 


»>  middle  =  (upper 
+  lower )/2.0 

»>  middle 

1.25 

»>  middle**2  >  2.0 

False 

»>  lower  =  middle 


7T" 


117 


Now  it’s  easy  to  repeat. 

Recall  that  pressing  the  up-arrow  [t]  on  your  keyboard  will  recall  previous  lines  in 
Python. 

We  simply  repeat  the  calculation  of  middle  from  the  current  (updated)  values  of 
upper  and  lower,  compare  its  square  to  2-0  and  then,  depending  on  whether 
middle’s  square  is  larger  or  smaller  than  2-0  we  change  the  value  of  upper  or 
lower. 

This  time  middle’s  square  is  smaller  than  2-0  so  we  increase  the  value  of  lower. 


u 

j-a 

ising  named  variables  —  5 

1 

tr  j 

/  »>  middle  =  (upper 

+  lower )/2.0 

»>  middle 

1.375 

»>  middle**2  >  2.0 

False 

»>  lower  =  middle 

I  r>yj- 

n> 

iU 

3  / 
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u< 

»  i-a 

5S 
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And  again. 


upper  =2.0 
lower  =  1.0 


So  we  are  really  caught  in  a  loop.  We  start  with  a  couple  of  named  values:  upper  and 
lower  which  define  the  limits  of  the  interval  containing  V2. 

Then  the  loop  starts. 

We  calculate  the  mid-point  and  attach  the  name  “middle”  to  it. 

Then  we  square  middle  and  test  to  see  if  it  is  bigger  than  2-0. 

If  it  is  (True)  we  lower  the  interval’s  upper  bound  by  changing  the  value  upper  is 
attached  to. 

If  it  isn’t  (False)  then  we  raise  the  lower  bound  by  changing  the  value  lower  is 
attached  to. 

We  keep  track  of  our  progress  by  printing  the  value  of  the  mid-point.  We  could  just  as 
well  have  printed  this  as  soon  as  we  calculated  it,  but  it  will  be  didactically  useful  later 
on  to  have  an  explicit  instruction  here. 


Homework:  V3  by  bisection 

Three  iterations,  please. 

Start  with 
upper  =  3.0 
lower  =  1.0 

Test  with 

middle**2  >  3.0 

Print  middle  at  the  end  of  each  stage: 
print (middle) 
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Got  that? 

Let’s  put  it  to  the  test.  Can  you  calculate  an  approximation  to  the  square  root  of  three 
(V3)  by  running  three  iterations  of  this  loop  testing  middle*  *2  against  3-0.  Start  with 
lower  set  to  1-0  and  upper  set  to  3-0. 

>»  upper  =  3.0 
>»  lower  =  l.O 

>»  middle  =  (upper+lower  )/2 . 0 
>»  middle 

2.0 

>»  middle**2  >  3.0 

True 

>»  upper  =  middle 

>»  middle  =  (upper+lower )/2 . 0 
>»  middle 

1.5 

>»  middle**2  >  3.0 

False 

>»  lower  =  middle 

»>  ... 


Still  not  a 
computer 
program! 


ucs 


121 


We’ve  still  not  delivered  on  our  promise  to  write  a  compute  program  yet.  We  have 
variables  which  make  our  task  easier  but  we’re  still  not  fully  automated. 

We  will  now  inspect  the  actions  we  have  been  taking  manually  starting  with  the  test  we 
do  to  see  if  the  mid-point  of  the  interval  is  too  high  or  too  low  and  what  we  do  as  a 
result  of  that  test. 


We  square  middle  and  test  it  for  being  larger  than  2 . 0  (the  number  whose  root  we 
want). 

If  that  test  returns  T rue  (i.e.  it  is  larger)  then  change  the  upper  bound  (in  variable 
upper)  to  have  the  same  value  as  the  mid-point  (in  variable  middle),  and  otherwise 
(if  it  returns  False)  to  change  the  lower  bound  (in  variable  lower)  to  match  the  mid¬ 
point  (variable  middle). 


if ...  then  ...  else  ... 

middle**2  >  2.G 

) 

if 

then  upper  =  middle 

else 

lower  =  middle 

ucs 
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In  computing  we  call  this  the  if  ...then. ..else...  construct. 

We  run  a  test  (middle* *2  >  2 . 0)  and  if  it  returns  True  then  we  do  something 
(upper  =  middle)  and  otherwise  (“else”)  we  do  a  different  something 
(lower  =  middle). 


condition 


keyword  if 

middle**2  >  2.0 

:  colon 

indentation  upper  =  middle  “True”  action 

keyword  else  :  colon 

indentation  lower  =  middle  “False”  action 
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So  now  let’s  meet  our  first  piece  of  serious  Python  syntax. 

We  take  the  Python  for  the  test  (“middle*  *2  >  2 . 0”)  and  precede  it  with  the  Python 
keyword  “if”.  Then  we  follow  it  with  a  colon,  “ :  ”.  The  word  “if”  and  the  colon  indicate 
that  an  if... then... else...  structure  is  about  to  start. 

After  this  comes  the  set  of  instructions  that  are  to  be  obeyed  if  the  test  returns  T rue, 
the  “then-block”.  There  is  no  explicit  keyword  for  “then”;  whatever  follows  the  if  line  is 
the  then-block.  All  the  lines  that  belong  in  the  then-block  are  indented.  They  are  set  in 
by  a  number  of  spaces  (we  use  four).  There  can  be  multiple  lines;  so  long  as  they  are 
all  indented  they  all  belong  in  the  then-block. 

At  the  end  of  the  then-block  comes  the  keyword  “else”  followed  by  another  colon. 
This  line  is  not  indented,  but  instead  lines  up  with  the  if.  It  does  not  belong  in  the 
then-block,  but  rather  marks  the  transition  from  the  then-block  to  the  “else-block”,  the 
set  of  lines  to  be  run  if  the  test  returns  False. 

Then  comes  the  else-block  itself.  This  is  indented  again,  and  must  be  indented  by  the 
same  number  of  spaces  as  the  then-block.  Again,  every  indented  line  counts  as  part 
of  the  else-block  and  the  first  unindented  line  (lining  up  with  if  and  else)  marks  the 
end  of  the  whole  if. ..then. ..else...  construct  and  is  obeyed  regardless  of  the  test’s 
result. 

It’s  worth  noting  that  the  colon  at  the  end  of  a  line  is  always  followed  by  an  indented 
block.  We’ll  see  that  pattern  again  (and  again...). 


Example  script:  middiei.py 


lower  =  1.0 
upper  =  2.0 

middle  =  (lower+upper )/2 .0 

if  middle**2  >  2.0  : 

print (' Moving  upper') 
upper  =  middle 

else  : 

print (' Moving  lower') 
lower  =  middle 

print (lower ) 
print(upper) 
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So  what  does  that  look  like  in  practice? 

The  script  middiei.py  contains  a  Python  script  that  does  the  first  iteration  of  our 
square  root  bisection  system. 

We  will  step  through  it  one  block  at  a  time  and  then  run  it  to  see  how  it  behaves. 


Example  script 

:  before 

1 

lower  =  1.0 
upper  =  2.0 

middle  =  (lower+upper )/2 .0 

1  Set-up  prior 

to  the  test. 

if  middle**2  >  2.0  : 

print (' Moving  upper') 
upper  =  middle 

else  : 

print ( 1  Moving  lower') 
lower  =  middle 

print (lower ) 
print(upper) 
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The  script  starts  with  a  straight-forward  unindented  block.  These  are  just  lines  of 
Python  that  get  executed.  These  lines  set  things  up  for  the  test  that  follows. 

The  first  two  lines  are  the  initial  bounds  of  the  interval  containing  the  square  root  of 
two: 

lower  =  1.0 
upper  =  2.0 

The  third  line  is  the  first  of  the  steps  we  will  eventually  repeat,  the  creation  of  a  mid¬ 
point: 

middle  =  (lower+upper )/2 . 0 

After  this  block  we  are  ready  to  move  on  to  the  if. ..then. ..else...  section. 


Example  script:  if... 


lCVWci  —  I . G 

upper  =  2.0 

middle  —  ( lo  we  r i u  pper ) /2 . G 

if  middle**2  >  2.0  : 

print (' Moving  upper') 
upper  =  middle 


else  : 


print (' Moving  lower') 
lower  =  middle 


print (lower ) 
print(upper) 


7 


keyword:  “if” 

condition 

colon 
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The  next  line  starts  with  the  if  keyword,  followed  by  the  test,  and  ending  with  the  colon 
if  middle**2  >  2.0  : 

This  starts  the  if. ..then. ..else...  construct. 

The  test  we  want  to  ask  is  whether  the  mid-point's  value  is  larger  than  the  square  root 
of  two? 

Because  we  don't  know  the  square  root  of  two  (yet)  we  set  the  equivalent  test: 

Is  the  square  of  the  mid-point's  value  greater  than  two? 

That’s  the  Python  “middle* *2  >  2.0”. 


Example  script:  then... 


lower  =  1.0 
upper  =  2.0 

middle  =  (lower+upper )/2 .0 

Four  spaces’ 

if  middle**2  >  2.0  : 

indentation 

print ( 1  Moving  upper') 
upper  =  middle 

The  “True” 
instructions 

else  : 

print (' Moving  lower') 
lower  =  middle 

print (lower ) 
print(upper) 
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Immediately  after  the  if  line  comes  the  then-block.  Note  that  this  can  include  more 
than  one  line.  In  this  case  we  have  two  lines: 
print (' Moving  upper') 
upper  =  middle 

Each  is  indented  by  the  same  amount:  four  spaces.  Actually,  you  can  use  any  amount 
of  indentation  so  long  as  you  are  consistent,  but  four  spaces  is  most  common  in  the 
Python  world  and  we  would  encourage  you  to  stick  to  that.  It  is  required  that  you  use 
the  same  number  of  spaces  everywhere. 


Example  script 

:  else... 

lower  =  1.0 
upper  =  2.0 

keyword:  “else” 

colon 

Four  spaces’ 
indentation 

miuuit:  -  (luwtii  fu|jpt:i  j/zi.w 

if  middle'" *2  >  2.G  ; 

F rint (' Moving  upper') 
ipper  =  middle 

else  : 

print ( 1  Moving  lower') 

The  “False” 

lower  =  middle 

instructions 

print (lower ) 
print(upper) 
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Next  comes  the  line 
else : 

This  is  unindented  so  it  marks  the  end  of  the  then-block  and  the  start  of  the  else-block. 
Again  notice  that  a  line  that  ends  with  a  colon  is  followed  by  an  indented  block: 
print (' Moving  lower') 
lower  =  middle 

The  else-block  consists  of  two  lines,  each  indented  by  the  same  amount  as  the  then- 
block  (by  four  spaces  in  our  examples). 


Example  script:  after 


lower  =  1.0 
upper  =  2.0 

middle  =  (lower+upper )/2 .0 

if  middle**2  >  2.0  : 

print ( 1  Moving  upper') 
upper  =  middle 

else  : 

print (' Moving  lower') 
lower  =  middle 

Not  indented 

print (lower ) 

Run  regardless 

print(upper)  ry  of  the  test  result. 

/ Y 

Finally  there  are  some  unindented  lines  at  the  end  of  the  script.  Because  they  are 
unindented  they  do  not  count  as  part  of  the  else-block  and  are  run  regardless  of  the 
result  of  the  text.  The  if... then... else...  construct  is  over  before  they  start. 


Example  script:  running  it 

lower  =  1.0 

Unix  prompt 

upper  =  2.0 

middle  =  (lower+upper )/2 .0 

$  python  middlel.py 

if  middle**2  >  2.0  : 

Moving  upper 

print ( 1  Moving  upper') 

1.0 

upper  =  middle 

1.5 

else  : 

$ 

print (' Moving  lower') 
lower  =  middle 

print (lower ) 

print(upper) 

. 

We  can  run  this  script.  It  automates  for  us  the  first  iteration  of  the  process  we  were 
doing  manually  before. 


Progress 

Run  a  test 

Do  something 
if  it  succeeds. 

Do  something 
else  if  it  fails. 

Colon...  Indentation 

ucs 


if  test  : 

something 
else  : 

something  else 


So  that  was  the  if. ..then. ..else...  construct. 

Note  that  what  lies  between  the  “if”  and  the  “ :  ”  is  evaluated  to  a  simple  Boolean 
value  (True  or  False).  It  can  be  anything  that  evaluates  like  that.  It  can  be  a  test  (the 
most  common  case),  but  it  can  also  be  anything  else  that  can  be  evaluated  to  a 
Boolean  (including  the  literal  values  True  and  False  and  any  boolean  combination  of 
them). 


Exercise 

Four  short  Python  scripts: 

if thenelsel . py 

1.  Read  the  file. 

if thenelse2 . py 

2.  Predict  what  it  will  do. 

if thenelse3 . py 

3.  Run  it. 

if thenelse4 . py 

ucs 
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Recall  that  Python  orders  strings  alphabetically  and  that  the  “%"  operator  returns  the 
remainder,  so  “3%2”  is  1  because  2  divides  into  3  with  a  remainder  of  1. 


So  far  we  have  scripted  a  single  iteration.  However,  as  the  name  “iteration”  suggests 
we  want  to  iterate  it:  run  it  time  after  time.  Our  if... then... else...  construct  sits  in  the 
middle  of  another  construct  that  runs  it  repeatedly.  That’s  what  we  want  to  do  next. 


Repeating  ourselves 

I - } 

Run  the  script 

i 

Read  the  results 

i 

Edit  the  script 

I  Not  the  way  to  do  it! 


ucs 


Now  we  could  take  our  script,  middlel .  py,  and  run  it,  edit  it  to  put  back  the  results 
and  run  it  again.  This  would  be  silly. 


Repeating 

ourselves 

Looping  for  ever? 

Keep  going  while... 

while  condition  : 

action 

1 

action 

2 

...then  stop. 

afterwards 

ucs 

136 

What  we  want  is  some  Python  syntax  that  lets  us  run  a  block  of  commands 
repeatedly.  We  probably  don’t  want  to  run  for  ever,  though.  Python’s  way  to  deal  with 
this  is  to  run  some  commands  while  some  test  returns  T rue. 

The  command  it  uses  for  this  is  called,  naturally  enough,  “while”  and  we  will  use  it  in 
a  style  similar  to  if.... 


while  vs.  until 

Repeat  until... 

Repeat  while... 

number  ==  0 

number  !=  0 

upper  -  lower  <  target 

upper  -  lower  >=  target 

condition 

not  condition 

ucs 
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Be  careful.  It's  very  easy  to  think  “loop  until”.  Python  thinks  in  terms  of  “loop  while”. 

Here  are  some  examples  of  “repeat  until...”  tests  converted  into  the  equivalent  “repeat 
while...”  tests.  They  are  essentially  opposites.  Note  that  while  the  opposite  of  “is 
equal”  (==)  is  obviously  “is  not  equal”  ( !  =)  the  opposite  of  “is  less  than ...”  (<)  is  “is 
greater  than  or  equal  to..."  (>=).  Don’t  forget  the  “or  equal  to”  bit. 

Generally,  any  Python  test  can  be  preceded  by  the  logical  negation  operator  “not”. 


Example  script 


number  =  1 
limit  =  1000 

while  number  <  limit  : 

print(number) 
number  =  number  *  2 

print ( ' Finished ! ' ) 


doublerl . py 


ucs 
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Let’s  examine  this  loop  construct  in  isolation  first  before  returning  to  our  bisection 
example. 

There  is  a  script  prepared  for  you  which  takes  a  number,  starting  at  1,  and  doubles  it 
repeatedly  until  it  goes  over  1,000.  We'll  take  it  bit  by  bit. 


Example  script:  before 


number  =  1 

Set-up  prior 

limit  =  1000 

to  the  loop. 

while  number  <  limit  : 

print(number) 
number  =  number  *  2 

print (' Finished !' )  ^ 

doublerl . py 


ucs 


We  start  with  the  preamble.  This  has  nothing  to  do  with  the  looping.  This  is  just  set-up. 


Example  script:  while... 


nunber  =  1 

keyword: 

Unit  =  1000 

condition 

while  number  <  limit  : 

colon 

print(number) 

number  =  number  *  2 

print (' Finished !' )  ^ 

doublerl . py 
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Now  we  start  the  looping. 

The  introductory  keyword  is  “while”.  This  kicks  off  the  whole  construct. 

The  test  is  that  number  is  less  than  the  limit.  Recall  that  this  test  must  be  True  for  the 
looping  to  continue.  Because  we  are  increasing  number  each  time  the  loop  repeats 
(we’re  doubling  it)  and  leaving  limit  unchanged  eventually  this  test  will  return 
False. 

The  line  ends  with  a  colon,  just  like  “if”  and  “else”. 

Recall  that  while  we  might  think  about  when  the  looping  should  stop  (“until...”),  Python 
thinks  about  when  it  should  keep  going  (“while...”). 


Example  script:  loop  body 

number  =  1 
limit  =  1000 

Four  spaces’ 

while  number  <  limit  : 

indentation 

print(number) 
number  =  number  *  2 

print (' Finished !' )  ^ 

loop  body 

doublerl . py 

ucs 
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The  while  line  is  followed  by  the  body  of  the  loop. 

This  is  the  section  that  is  going  to  be  repeated  while  the  test  continues  to  return  T rue. 
Each  line  of  the  loop-block  is  indented  by  the  standard  amount  (four  spaces  for  us). 
Note  that,  again,  indentation  follows  a  colon. 


Example  script:  after 


number  =  1 
limit  =  1000 

while  number  <  limit  : 

print(number) 

Not  indented 

number  =  number  *  2 

Run  after 

print (' Finished !' )  ^ 

the  looping 
is  finished. 

doublerl . py 
ucs 


After  the  loop  block  there  is  an  unindented  line.  This  line  will  not  be  run  until  the 
looping  is  finished. 


Example  script:  running  it 

>  python  doublerl.py 

1 

2 

4 

number  =  1 
limit  =  1000 

while  number  <  limit  : 

print(number) 
number  =  number  *  2 

print (' Finished !' ) 

8 

16 

32 

64 

128 

256 

512 

Finished ! 

ucs 

So  let's  run  it. 


Progress 

Run  a  test 

Do  something 
if  it  succeeds. 

Finish  if 
it  fails. 

Go  back  to  the  test, 
ucs 


while  test 
something 


test 


True 


something 


I 


False 
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Exercise 

Four  short  Python  scripts: 

whilel . py 
while2 . py 
while3 . py 
while4 . py 

ucs 


1.  Read  the  file. 

2.  Predict  what  it  will  do. 

3.  Run  it. 

n.b.  [Ctrl]+[C]  will  kill 
a  script  that  won’t 
stop  on  its  own. 
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Don't  worry  too  much  if  your  mental  arithmetic  isn't  up  to  while4 .  py. 


upper  =  2.0 
lower  =  1.0 


while... 

*1 

middle  = 

— 

=  (upper  +  lower )/2.0 

if... 

then... 

else... 

upper 

middle* *2  >  2.0 

=  middle  lower  =  middle 

print  (m: 

1 

Lddle) 

Now  let’s  return  to  our  bisection  example.  We  are  going  to  put  the  if  ...then. ..else, 
construct  (which  narrows  our  interval)  inside  awhile...  construct  which  will  keep 
repeating  that  narrowing  until  we  have  our  answer. 


Combining  while...  and  if... 

if... then... else...  inside  while... 

Each  if. ..then. ..else...  improves  the  approximation 
How  many  if. ..then. ..else...  repeats  should  we  do? 

What’s  the  while...  test? 
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We  will  take  the  logic  of  building  our  while . . .  loop  very  slowly  this  first  time. 

The  if... then... else...  improves  the  interval  by  a  factor  of  two  (i.e.  it  narrows  the 
interval  to  a  half  of  its  previous  size).  How  many  of  these  iterations  do  we  need  to  do? 
In  other  words,  what’s  the  test  that  needs  to  go  after  the  while  keyword? 


Writing  the  while...  test 

Each  if. ..then. ..else...  improves  the  approximation 

How  much  do  we  want  it  improved? 

How  small  do  we  want  the  interval? 
upper  -  lower 
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So,  ignoring  the  Python  for  a  moment,  we  need  to  decide  how  much  we  want  the 
approximation  improved.  The  quality  of  the  approximation  is  given  by  the  width  of  the 
interval.  The  width  of  the  interval  is  simply  the  upper  bound  minus  the  lower  bound.  In 
Python  the  size  of  the  interval  is  just  upper  -  lower. 


Writing  the  while...  test 

What  is  the  interval?  upper  -  lower 

How  small  do  we  want  the  interval?  1 .  Oe  - 15 

Keep  going  while  the  interval  is  too  big: 
while  upper  -  lower  >  1.0e-15  : 
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Our  interval’s  size  is  given  by  upper  -  lower. 

We  will  make  up  a  target  quality.  We’ll  use  10 15  for  this  example.  It  can  be  any  very 
small  number.  Recall  that  in  Python  this  is  written  as  1 . 0e  - 15. 

So  the  test  that  the  interval  is  too  big  (and  that  we  must  continue  reducing  it)  is  that 
upper  -  lower  >  1.0e-15 
This  means  that  the  full  while  line  is 

while  upper  -  lower  >  1.0e-15  : 
and  this  is  the  line  that  we  will  use. 


lower  =  1.0 
upper  =  2.0 

while  upper  -  lower  >  1.0e-15 


Single  indentation 

if. ..then. ..else... 


middle  =  (upper+lower)/2.0 


approximation 
is  too  coarse 


print  (middle)  No  indentation 

ucs 


So  let’s  start  building  our  script. 

We  start  with  the  outer  while  loop. 

We  set  up  the  initial  values  of  the  lower  and  upper  bounds  prior  to  any  refinement. 

The  next  line  is  the  while...  line  as  described  before. 

Then  we  start  the  indented  block  to  be  repeated  while  the  approximation  is  too  coarse. 
This  starts  with  the  creation  of  the  mid-point  and  will  continue  with  the  if  ...then... 
else...  section. 

Finally  we  announce  that  we  are  done  by  printing  the  eventual  mid-point  value.  This  is 
entirely  unindented  so  is  only  run  after  the  while...  loop  is  finished. 

All  we  need  to  do  now  is  insert  the  if  ...then. ..else...  block. 


lower  =  1.0 
upper  =  2.0 

while  upper  -  lower  >  1.0e-15 


middle  =  (upper+lower)/2.0 
if  middle**2  >  2.0: 

print  ('Moving  upper')  Double  indentation 

upper  =  middle 
else : 

print ( ’Moving  lower’)  Double  indentation 

lower  =  middle 


print (middle) 

ucs 


And  it’s  actually  trivial.  We  simply  insert  it.  However,  it  comes  one  level  indented  in  its 
entirety  by  the  while...  loop.  So  all  we  have  done  is  to  take  the  if... then... else...  we 
already  have  and  move  it  over  one  more  level  of  indentation. 

The  if. . .  and  else  lines  are  indented  once  by  the  while. . .  loop  they  are  in.  The  then- 
block  and  the  else-block  are  doubly  indented  (so  by  eight  spaces)  because  they  are 
indented  once  for  the  while...  and  once  for  the  if... then... else.... 


Running  the  script 

> python  root2.p\ 

Moving  upper 
Moving  lower 
Moving  lower 
Moving  upper 

Moving  upper 
Moving  upper 
Moving  lower 
Moving  lower 
1.41421356237 


lower  =  1.0 
upper  =  2.0 

while  upper  -  lower  >  1.0e-15  : 

middle  =  (upper+lower)/2.0 

if  middle**2  >  2.0  : 

print( 'Moving  upper') 
upper  =  middle 
else  : 

print( ' Moving  lower') 
lower  =  middle 

print (middle) 


And  look!  It  works. 


Indentation 

c.f.  “legalese” 
§5(b)(ii) 


Other  languages... 
{-} 

IF. ..END  IF 
if...fi,  do. ..done 
ucs 


Let’s  return  to  the  issue  of  this  nested  indentation.  The  best  way  to  think  of  it  is  as  an 
analogue  of  “legalese”  where  regulations  have  paragraphs,  sub-paragraphs,  sub-sub- 
paragraphs  and  so  on,  each  of  which  is  more  indented  that  the  level  before. 

But  its  use  of  indentation  is  Python’s  most  controversial  features.  All  languages  need 
some  mechanism  within  the  language  to  mark  the  start  and  end  of  these  nested 
blocks  of  code.  C  and  derived  languages  use  left  and  right  curly  brackets  (“braces”). 
Fortran  uses  expressions  like  IF  and  END  IF.  The  shell  (the  language  you  type  at  the 
Unix  prompt)  has  if  statements  that  end  with  “f i”  (“if”  backwards).  Its  analogue  of 
the  while  loop  uses  “do”  and  “done”  to  mark  the  start  and  end  of  the  loop-block.  It 
would  have  used  “od”  but  that  was  already  taken  by  the  Unix  “octal  dump”  command. 
What  is  interesting  is  that  when  programmers  work  in  these  languages  they  typically 
added  multiple  levels  of  indentation  to  make  them  easier  to  read.  Python  just  takes 
this  one  step  further  and  makes  the  indentation  syntactically  significant. 


Indentation:  level  1 

lower  =  1.0 
upper  =  2.0 

while  upper  -  lower  >  1.0e-15  : 

middle  =  (upper+lower )/2 . 0 

if  middle**2  >  2.0  : 

print (' Moving  upper') 
upper  =  middle 
else  : 

print ( 1  Moving  lower') 
lower  =  middle 

print(middle) 

ucs 


Colon  starts 
the  block 


Indentation 
marks  the 
extent  of 
the  block. 

Unindented  line 
End  of  block 


So  let’s  look  closely  at  the  indentation  in  the  script.  The  while  line  ends  with  a  colon 
and  is  followed  by  an  indented  block.  The  indentation  marks  the  extent  of  the  block. 
The  first  line  that’s  not  indented  is  the  first  line  beyond  the  end  of  the  block. 


Indentation:  level  2 

lower  =  1.0 
upper  =  2.0 

while  upper  -  lower  >  1.0e-15  : 


middle  =  (upper+lower)/2.0 


if  middle**2  >  2.0  : 

□ 

else 

print (' Moving  upper') 
upper  -  middle 

: 

Colon...  indentation 

“else”  unindented 

□ 

print (' Moving  lower') 
lower  =  middle 

Colon...  indentation 

print(middle) 

ucs 


Within  that  indented  block  we  have  two  more  lines  that  end  with  a  colon  and  introduce 
blocks  indented  with  respect  to  those  lines  (i.e.  already  indented  one  level). 


Arbitrary  nesting 

Not  just  two  levels  deep 
As  deep  as  you  want 

Any  combination  if...  inside  while... 

while...  inside  if... 
if...  inside  if... 
while...  inside  while... 
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We  have  used  the  example  of  an  if. ..then. ..else...  construct  inside  a  while...  loop  with 
nested  indentation.  We  can  do  it  with  any  Python  construct  that  uses  indentation, 
nested  arbitrarily  and  arbitrarily  deep. 


e.g.  if...  inside  if... 

number  =  20 

if  number  %  2  ==  0 : 

if  number  %  3  ==  0: 

print (' Number  divisible  by  six') 
else : 

print (' Number  divisible  by  two  but  not  three') 

else : 

if  number  %  3  ==  0: 

print (' Number  divisible  by  three  but  not  two') 
else : 

print (' Number  indivisible  by  two  or  three') 


ucs 


For  example,  we  can  nest  one  if... then... else...  inside  another. 


Progress 

colon... indentation 

Indented  blocks 

Nested  constructs 

Levels  of  indentation 

ucs 
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Exercise 

Write  a  script  from  scratch:  collatz .  py 


1.  Start  with  number  set  to  7. 

2.  Repeat  until  number  is  1. 

3.  Each  loop: 

3a.  If  number  is  even,  change  it  to  number/2. 

3b.  If  number  is  odd,  change  it  to  3*number+l. 
3c.  Print  number. 
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This  is  an  extended  exercise.  You  may  need  to  take  the  full  fifteen  minutes  to  write  it. 

We  are  going  to  implement  a  script  that  investigates  a  bizarre  mathematical 
phenomenon:  Take  any  positive  number.  Apply  the  iteration  shown  in  the  slide.  The 
“Collatz  Conjecture”  says  that  you  will  always  end  up  looping  through  the  three 
numbers  4 ->2-1-4- .... 

Starting  with  7  you  should  see  this  series  of  numbers: 

22  -  11  -  34  -  17  -  52  -  26  -  13  -  40  -  20  -  10  -  5  - 16  -  8  -  4  -  2  - 1. 

Once  you’ve  got  it  working,  try  starting  with  47  for  a  longer  list  of  numbers,  going  much 
higher. 

Hints: 

1.  The  test  to  see  if  a  number  is  even  is  to  see  whether  or  not  its  remainder  is  zero 
when  divided  by  two: 

number  %  2  ==  0 

2.  Changing  number  to  number/2  is 

number  =  number/2 

3.  Changing  number  to  3*number+l  is 

number  =  3*number  +  1 

You  just  need  to  add  the  while...  and  if... then... else...  syntax. 


Comments 

Reading  Python  syntax 

middle  =  (upper  +  lower)/2.0 

“What  does  the  code  do?” 

Calculate  the  mid-point. 

“Why  does  the  code  do  that?” 

Need  to  know  the  square  of  the 
mid-point’s  value  to  compare  it 
with  2.0  whose  root  we’re  after. 


We’re  now  writing  real  Python  scripts.  There’s  one  things  we  can  add  that  will  make 
life  a  lot  easier  in  the  long  term:  comments. 

We  can  read  Python  syntax.  We  can  see  a  line  such  as 
middle  =  (upper  +  lower)/2.0 

and  determine  what  it  is  doing.  But  why  is  it  doing  it?  Why  do  we  want  the  value  of  the 
mid-point? 

A  comment  is  a  piece  of  text  in  the  script  which  does  not  get  executed  by  Python  and 
which  can  carry  a  message  explaining  the  why  of  the  script. 


Comments 


# 

# 


The  “hash”  character,  a.k.a.  “sharp” 

“pound” 

“number” 

Lines  starting  with  “#”  are  ignored 
Partial  lines  too. 


ucs 


Comments  in  Python  are  introduced  by  the  “#”  character,  which  we  will  pronounce 
“hash”.  The  comment  can  be  a  whole  line  or  part  of  a  line.  Everything  from  the  hash  to 
the  end  of  the  line  is  ignored. 


Comments  —  explanation 

#  Set  the  initial  bounds  of  the  interval.  Then 

#  refine  it  by  a  factor  of  two  each  iteration  by 

#  looking  at  the  square  of  the  value  of  the 

#  interval's  mid-point. 

#  Terminate  when  the  interval  is  1.0e-15  wide. 

lower  =  1.0  #  Initial  bounds, 
upper  =  2.0 

while  upper  -  lower  <  1.0e-15  : 


ucs 


Comments  can  be  used,  as  suggested,  to  give  a  “why”  for  a  script. 


Comments  — 

authorship 

#  (c)  Bob  Dowling,  2010 

#  Released  under  the  FSF 

GPL  v3 

#  Set  the  initial  bounds 

#  refine  it  by  a  factor 

#  looking  at  the  square 

#  interval's  mid-point. 

of  the  interval.  Then 
of  two  each  iteration  by 
of  the  value  of  the 

#  Terminate  when  the  interval  is  1.0e-15  wide. 

lower  =  1.0  #  Initial  bounds, 
upper  =  2.0 

ucs 
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They  can  also  be  used  to  enter  copyright  and  licensing  statements. 


Comments  —  source  control 


#  (c)  Bob  Dowling,  2010 

#  Released  under  the  FSF  GPL  v3 

#  $Id :  root2 . py, v  1.1  2010/05/20  10:43:43  rjd4  $ 

#  Set  the  initial  bounds  of  the  interval.  Then 

#  refine  it  by  a  factor  of  two  each  iteration  by 

#  looking  at  the  square  of  the  value  of  the 

#  interval's  mid-point. 

#  Terminate  when  the  interval  is  1.0e-15  wide. 
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If  the  script  is  being  edited  you  can  keep  a  version  number  or  “last  edited”  date  in  a 
comment.  Most  version  control  systems  can  do  this  for  you  automatically. 


Comments  —  logging 


#  (c)  Bob  Dowling,  2010 

#  Released  under  the  FSF  GPL  v3 

#  $Id :  root2 . py, v  1.2  2010/05/20  10:46:46  rjd4  $ 

#  $Log :  root2 . py, v  $ 

#  Revision  1.2  2010/05/20  10:46:46  rjd4 

#  Removed  intermediate  print  lines. 

# 

#  Set  the  initial  bounds  of  the  interval.  Then 

#  refine  it  by  a  factor  of  two  each  iteration  by 

# 

ucs 


If  the  script  is  being  edited  you  can  also  keep  a  log  of  changes  in  a  comment.  This  is 
not  just  a  “why”  but  a  “how  it  got  to  be  this  way”  comment.  Again,  some  version  control 
systems  can  automatically  maintain  such  a  logging  comment. 


Comments 

Reading  someone 

Writing  code  for 

else’s  code. 

someone  else. 

Reading  your  own 

Writing  code  you 

code  six  months  later. 

can  come  back  to. 

ucs 
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Perhaps  the  best  way  to  think  of  comments  is  this: 

If  you  were  given  a  script  written  by  someone  else,  what  comments  would  you  like  to 
see  to  make  your  life  easier?  Those  are  the  comments  that  you  should  write  so  that 
you  can  pass  your  script  to  somebody  else. 

You  may  think  that  your  script  never  will  be  passed  on  to  someone  else.  However,  you 
may  be  that  somebody.  Write  a  script,  put  is  away  and  don’t  come  back  to  it  for  six 
months.  Next  time  you  read  it,  it  might  as  well  have  been  written  by  somebody  else. 


Exercise 

1.  Comment  your  script: 

collatz . py 

Author 

# 

Bob  Dowling 

Date 

# 

2010-05-20 

Purpose 

# 

This  script 

# 

illustrates  ... 

2.  Then  check  it  still  works! 

ucs 

0  3  minutes  i67 

Remember  the  collatz  .  py  script  you  wrote  for  an  exercise?  Add  some  comments 
to  it. 

Comment  lines  are  ignored  by  the  Python  interpreter.  They  should  have  no  effect  on 
the  execution  of  your  code.  Make  sure  your  script  still  works  afterwards. 


Lists 


[ ' Jan ' , 

'Feb', 

' Mar ' , 

'Apr', 

' May ' , 

'  Jun', 

'  Jul\ 

, 

Sep', 

1  Oct ' , 

'  Nov  , 

'  Dec  ] 

[2,  3,  5,  7,  11,  13,  17,  19] 

[0.0,  1.5707963267948966,  3.1415926535897931] 

ucs  168 


Now  take  a  deep  breath.  We  are  going  to  introduce  a  new  type  of  Python  object  that  is 
one  of  the  most  pervasive  types  in  all  of  Python.  Very  many  Python  procedures  rely  on 
it. 

It's  called  a  “list”,  a  finite  sequence  of  items  (often  called  “elements”). 


Lists  —  getting  it  wrong 


A  script  that  prints  the  names  of  the 
chemical  elements  in  atomic  number  order. 


print( 

print( 

print( 

print( 

print( 

print( 

print( 

print( 
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' hydrogen  1 ) 
' helium ' ) 

' lithium ' ) 

' beryllium ' 
' boron ' ) 

' carbon  1 ) 

' nitrogen ' ) 
' oxygen ' ) 


X 


Repetition  of  “print” 
Inflexible 
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You  can  usually  spot  when  you  ought  to  be  using  a  list  in  Python  because  you  get  very 
repetitive  scripts  that  do  something  to  an  item,  then  do  the  same  thing  to  another  item, 
then  to  a  third,  a  fourth  and  so  on. 

For  example,  rather  than  have  a  ninety-two  line  script  that  has  a  print  statement  for 
each  of  the  chemical  element  names  we  would  be  better  off  with  a  list  of  the  ninety- 
two  names  and  an  instruction  that  said  “print  them”. 


Lists  —  getting  it  right 

A  script  that  prints  the  names  of  the 
chemical  elements  in  atomic  number  order. 

1.  Create  a  list  of  the  element  names 

2.  Print  each  entry  in  the  list 

ucs 


So  let's  look  at  how  that's  done. 

We  will  start  by  creating  one  of  these  lists,  containing  the  element  names. 

Then  we  will  introduce  a  Python  construct  that  lets  us  do  something  to  each  element 
of  the  list. 


>  ( D 


Creating  a  list 

»>  [  1,  2,  3  ] 

Here’s  a  list 

[1,  2,  3] 

Yes,  that’s  a  list 

»>  numbers  =  [  1,  2, 

3  j  Attaching  a  name 
to  a  variable. 

»>  numbers 

Using  the  name 

[1,  2,  3] 

ucs 
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3,  let's  create  a  literal  list.  (i.e.  one  directly  typed  in.) 

list  is  a  series  of  items,  separated  by  commas,  contained  in  square  brackets. 
That's  how  Python  represents  it  when  it's  output  too. 

A  list  is  just  another  Python  object  so  we  can  assign  it  a  name  too  if  we  want. 


Anatomy  of  a  list 

Square  brackets  at  end 
Individual  element 

[  'alpha' ,  'beta',  'gamma',  'delta'  ] 

Elements  separated  by  commas 

ucs 


This  is  all  there  is  to  the  representation  of  a  list. 

Spaces  either  side  of  the  square  brackets  or  commas  are  ignored. 


Square  brackets  in  Python 

[■■■]  Defining  literal  lists 

e.g. 

»>  primes  =  [2,3,5,7,11,13,17,19] 

ucs 


We  are  going  to  meet  square  brackets  a  lot  in  the  remainder  of  the  course  so  we  will 
start  building  up  a  list  of  the  various  things  they  are  used  for. 

First  use:  they  are  used  to  mark  the  ends  of  a  literal  list. 


Order  of  elements 

No  “reordering” 

»>  [  1,  2,  3  ] 

>» 

[  3,  2,  1  ] 

[1,  2,  3] 

[3, 

2,  1] 

»>  [  ' a' ,  ' b '  ] 

>» 

[  1 b ' ,  'a'  ] 

['a',  '  b '  ] 

[ '  b 

,  'a'] 

ucs 
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Note  that  the  elements  of  a  list  have  a  specific  order.  It  is  the  order  they  are  defined 
with  and  there  is  no  automatic  sorting  or  reordering  based  on  the  values  of  the  items 
in  the  list. 


Repetition 

No  “uniqueness” 

»>  [  1,  2,  3,  1,  2,  3  ] 

[1,  2,  3,  1,  2,  3] 

»>  [  ' a' ,  '  b ' ,  '  b ' ,  'c'  ] 

['a',  '  b ' ,  '  b ' ,  '  c '  ] 

ucs 
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Also  note  that  you  are  perfectly  well  allowed  to  have  values  appear  more  than  once  in 
a  list.  Repeats  are  not  stripped  out. 


Concatenation  —  1 

“+”  used  to 
join  lists. 

»>  [  1,  2,  3  ]  +  [  4,  5,  6,  7  ] 

[1,  2,  3,  4,  5,  6,  7] 

»>  [ 'alpha' , 'beta' ]  +  ['gamma'] 

['alpha',  'beta',  'gamma'] 

ucs 


So  what  can  we  do  with  lists? 

Well,  we  can  join  them  together  in  a  process  called  “concatenation”.  Just  as  we  did 
with  strings,  we  can  concatenate  them  with  the  “+"  sign. 


Concatenation  —  2 


“3”  appears 

twice 

»>  [  1,  2,  3  ]  +  [  3,  4,  5,  6,  7  ] 

[1,  2,  3,  3,  4,  5,  6,  7] 


“3”  appears 

twice 
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Again,  notice  that  there  is  no  automatic  uniqueness.  If  a  concatenation  beings  two 
identical  values  together  you  end  up  with  a  list  containing  both  of  them. 


Empty 

list 

»> 

[] 

[] 

»> 

[2,3, 

5,7,11,13]  +  [] 

[2, 

3,  5, 

7,  11,  13] 

»> 

[]  + 

[] 

[] 

ucs 

There's  nothing  to  say  that  you  can't  have  an  empty  list.  A  pair  of  square  brackets  with 
nothing  between  them  (spaces  are  still  ignored)  gives  an  empty  list. 


Progress 

Lists 

[23,  29,  31,  37,  41] 

Shown  with  square  brackets 

Elements  separated  by  commas 


Concatenation 

[23,  29]+[  31,  37,  41] 

Empty  list 

[] 

ucs 
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Exercise 

Predict  what  these  will  create.  Then  check. 

1.  []  +  ['a',  ' b ' ]  +  [] 

2.  [’C,  1  d 1  ]  +  [ 1  a 1 ,  ’b1] 

3.  [2,  3,  5,  7]  +  [7,  11,  13,  17,  19] 


©  2  minutes  180 


How  long  is  the  list? 

Function  name 

»>  len([l0,  20,  30]) 


CO 

Kouna  oracKets 

ucs 


Now  let's  start  doing  some  things  with  our  lists.  We  can  ask  how  long  our  list  is  with  a 
new  Python  function  called  “len( )”. 


How  long  is  a  string? 


Same  function 

»>  len  (  '  Hello,  world!') 

13 


A  Recall: 

Quotes  say  “this  is  a  string”. 

They  are  not  part  of  the  string. 

ucs 


We  can  also  ask  for  the  length  of  a  string. 

This  counts  the  characters  in  the  string.  Recall  that  the  quotes  simply  indicate  to 
Python  that  this  is  a  string:  they  are  not  part  of  the  string. 


How  long  is  a  number ? 


»>  len  ( 42 ) 


Error  message 

Traceback  (most  recent  call  last): 
File  "<stdin>",  line  1,  in  <module> 


TypeError : 

object  of  type  'int' 

has  no  len() 

Numbers  don’t 

ucs 

have  a  “length”. ! 

Note  that  we  can't  ask  for  the  length  of  a  number.  It  is  a  meaningless  concept. 


Our  first  look  at  a  function 

Function  name 

»>  len([10,  20,  30])  Round  brackets 
3 

Function  “argument” 
One  argument 

“Returned”  value 
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len( )  is  our  first  real  Python  function,  as  print  is  a  bit  special,  so  it  pays  to  take  a 
close  look. 

The  function  name  is  followed  by  round  brackets  (“parentheses”)  which  contain 
everything  that  is  going  to  be  fed  into  the  function  for  it  to  calculate  a  result.  In  this 
case  there  is  only  one  argument.  It  is  a  list  (which  contain  elements  of  its  own)  but  the 
one  list  is  the  one  argument.  Triggering  the  use  of  a  function  like  this  is  called  “calling 
the  function”. 

The  function  calculates  a  value  from  the  input(s)  it  is  given  in  its  brackets  and  when 
Python  interprets  the  function  it  uses  this  value.  We  say  that  “the  function  returns  a 
value”. 


Progress 

Length  of  a  list: 

Number  of  elements 

Length  of  a  string: 

Number  of  characters 

Length  function: 

len() 

ucs 
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Exercise:  lengths  of  strings 

1.  Predict  what  these  Python  snippets  will  return. 

2.  Then  try  them. 

(a)  len( ' Goodbye,  world!') 

(b)  len( ' Goodbye,  '  +  'world!') 

(c)  len( ' Goodbye,  ')  +  len( 'world !' ) 

ucs  ©  3  minutes  ^ 

for  both 


There's  two  slides  of  exercises  to  do  in  this  exercise  segment.  The  first  covers  lengths 
of  strings... 


Exercise:  lengths  of  lists 

1.  Predict  what  these  Python  snippets  will  return. 

2.  Then  try  them. 

(d)  len( [' Goodbye,  world!']) 

(e)  len( [' Goodbye,  ' ]  +  ['world!']) 

(f)  len(  [' Goodbye,  '])  +  len(  [ 'world  !'] ) 

ucs  ©  3  minutes  w 

for  both 


..and  the  second  the  lengths  of  lists. 


Picking  elements  from  a  list 

»>  letters  =  ['a\  'b\  'c\  '  d '  ] 


We've  looked  at  lists  as  a  whole,  but  we  still  need  to  extract  individual  items  from 
them. 


The  first  element  in  a  list 

»>  letters[0]  Count  from  zero 

' a '  “Index” 


letters[0] 
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They  key  to  extracting  individual  elements  is  that  they  each  have  a  position  in  the  list. 
By  declaring  the  position  we  can  access  the  item.  This  position  is  called  the  index  of 
the  item  in  a  list. 

Python  counts  its  indices  from  zero. 

(This  is  not  uncommon  in  programming  languages.  The  “count  from  zero”  vs.  “count 
from  one”  philosophical  battles  have  been  fought  long  and  hard.) 

What  matters  from  our  perspective  is  how  Python  refers  to  the  index.  It  does  it  by 
taking  the  list  (or,  more  typically,  the  list's  name)  and  following  it  with  the  index  in 
square  brackets. 


Square 

brackets  in  Python 

[-] 

Defining  literal  lists 

numbers [N] 

Indexing  into  a  list 

e.g. 

»>  primes  =  [2,3,5,7,11,13,17,19] 

»>  primes[0] 

o 

ucs 
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See,  I  told  you  that  we  would  see  square  brackets  a  lot. 

The  fact  that  lists  use  square  brackets  for  literal  lists  as  well  as  indices  is  just  a 
coincidence.  Later  we  will  see  square  brackets  used  for  indices  on  an  object  created 
with  curly  brackets. 


“Element  number  2” 

»>  letters[2] 

'  c  '  The  third  element 


letters 


HvariableTT- 


list 


str  a 


str  b 


str  c 


str  d 


letters[0] 

letters[l] 

letters[2] 

letters[3] 
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Remember  that  Python  counts  from  zero. 

A  useful  trick  of  language  is  to  avoid  talking  about  “the  first  item”  or  “the  second  item” 
and  to  refer  to  “item  number  zero”  or  “item  number  one”. 


Going  off  the  end 

»>  letters[4] 

Traceback  (most  recent  call  last): 

File  "<stdin>",  line  1,  in  <module> 
IndexError:  list  index  out  of  range 


letters 


HvariableTT- 


list 


str  a 


str  b 


str  c 


str  d 


5c  letters[4] 
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What  happens  if  you  give  an  index  that  runs  off  the  end  of  a  list?  You  get  an  error 
message. 


Maximum  index  vs.  length 

»>  len(letters) 

4  Maximum 

index  is  3! 


letters 


HvariableTT- 


list 


str  a 


str  b 


str  c 


str  d 


letters[0] 

letters[l] 

letters[2] 

letters[3] 


4 
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Remember  that  Python  counts  from  zero.  If  a  list  has  length  four,  it  has  four  items  in  it. 
These  are  indexed  0,  1,  2,  and  3  so  4  is  not  a  valid  index. 


“Element  number  -1 !” 

»>  letters[-l] 

1  d  1  The  final  element 

str 

a 

letters[0] 

str 

b 

letters[l] 

str 

c 

letters[2] 

letters[-l] 

ucs 

str 

d 

letters[3] 
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But  we  can  use  negative  numbers! 

The  index  -1  refers  to  the  last  element  of  the  list.  This  can  be  very  useful. 


Negative  indices 

»>  letters[-3] 

'  b ' 


letters[-4] 

str 

a 

letters[0] 

letters[-3] 

str 

b 

letters[l] 

letters[-2] 

str 

c 

letters[2] 

letters[-l] 

str 

d 

letters[3] 

ucs 


The  negative  numbers  actually  work  all  the  way  back  though  this  is  typically  less 
useful  than  just  refering  to  the  last  item  with  -1. 


Going  off  the  end 

»>  letters[-5] 

Traceback  (most  recent  call  last): 

File  "<stdin>",  line  1,  in  <module> 
IndexError:  list  index  out  of  range 


ucs 


But  you  can't  go  too  far  back  just  like  you  can't  go  too  far  forwards. 


Valid  range  of  indices 

»>  len(letters) 

4 

-3-2-10  12  3 


letters  [-4] 

str 

a 

letters[0] 

letters[-3] 

str 

b 

letters[l] 

letters  [-2] 

str 

c 

letters[2] 

letters[-l] 

str 

d 

letters[3] 

ucs 


There  is  always  one  zero-or-positive  index  and  one  negative  index  for  each  entry. 
An  empty  list  has  no  valid  indices. 


Indexing  into  literal  lists 

»>  letters  =  ['a\  'b\  'c',  '  d '  ] 

»>  letters  [3]  Index 

Name  of  list 


i  /-j  i 


Legal,  but  rarely  useful: 

»>  [ '  a' ,  '  b ' ,  'c',  1  d '  ]  [3] 

'  d ' 


Index 

Literal  list 


ucs 
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Square  bracketted  indices  typically  follow  a  list  name.  What  matters  is  that  the  thing 
they  follow  is  a  list,  whether  this  is  via  a  name  or  directly.  However  putting  an  index 
after  a  literal  list  is  legal  but  not  useful.  The  author  can  imagine  no  use  for  it  at  all,  but 
is  realistic  about  the  limits  of  his  imagination. 

Later  we  will  meet  functions  that  return  lists.  We  can  put  the  square  bracketted  indices 
directly  after  those  function  calls  too. 


Assigning  list  elements 


»>  letters 

['a',  'b\  'c\ 

The  name  attached  to 
the  list  as  a  whole 

'  d '  ] 

The  name  attached  to 

one  element  of  the  list 

Assign  a  new  value 

»>  letters[2] 

=  'X'  The  new  value 

»>  letters 

['a',  ' b ' ,  'X', 

'  d '  ] 

ucs 
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We  use  this  notation  to  set  the  individual  values  in  lists  as  well  as  to  get  them, 

Just  as  we  can  use  a  simple  name  on  the  left  hand  side  of  an  assignment,  we  can  use 
an  indexed  name  on  the  left  to  set  the  value  of  just  one  element  of  the  named  list. 

Lists  are  known  as  “mutable”  objects  because  we  can  change  individual  elements 
within  them. 


Progress 

Index  into  a  list 

['X',  'y\  'z'] 

Square  brackets  for  index 

list [index] 

Counting  from  zero 

list[  0] - ►  'x' 

Negative  index 

list  [  -1] - ► 1  z  1 

Assignment 

list [  1]  =  'A' 

ucs 
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Exercise 

Predict  the  output  from  the  following  five  commands. 
Then  try  them. 

data  =  [ 'alpha' ,' beta' , 'gamma' , 'delta' ] 

data[l] 

data[ -2] 

data[2]  =  'gimmel' 
data 

ucs  ©  3  minutes  201 


Note  that  two  of  the  commands  in  this  exercise  don't  produce  any  output. 


Doing  something  with  a  list 

Recall  our  challenge: 

A  script  that  prints  the  names  of  the 
chemical  elements  in  atomic  number  order. 

1.  Create  a  list  of  the  element  names 

2.  Print  each  entry  in  the  list 

ucs  2C 


Let's  return  to  our  challenge.  We  wanted  a  list  of  chemical  element  names  and  then 
we  were  going  to  do  something  with  each  of  the  items  in  the  list. 

Well,  we  now  know  how  to  create  a  list  of  element  names: 
names  =  ['hydrogen',  'helium',  'lithium',...] 

so  now  we  just  need  to  learn  how  to  do  something  with  each  and  every  one  of  them. 
For  our  specific  challenge  we  want  to  print  it. 


Each  list  element 

Given  a  list 

Start  with  the  first  element  of  the  list 
Do  something  with  that  element 

Are  there  any  elements  left? - ►  Finish 

Move  on  to  the  next  element 

_ i 

UCS  2 


We  are  going  to  start  with  a  list.  Our  construct  will  take  the  first  element  of  the  list  and 
do  something  with  it.  If  there  is  a  second  element  the  construct  will  then  move  on  to 
that  item  and  do  the  same  thing  to  it.  And  then  to  the  third,  the  fourth  and  so  on,  until 
there  are  no  elements  of  the  list  left. 

We  will  be  reading  our  way  through  the  list.  We  will  not  be  removing  the  items  as  we 
go.) 


Each  list  element 

Need  to  identify 
“current”  element 


Do  something  with  that  element 


Need  to  identify  a 

Another  indented  block 

block  of  commands 
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The  key  is  the  “do  something  with  that  element”  phrase. 

First,  we  need  some  sort  of  hook  to  identify  the  particular  element  in  question. 
Second  we  need  to  identify  the  block  of  commands  that  will  be  run  on  the  current 
element.  This  is  Python  so  the  block  will  be  marked  by  being  indented. 


The  “for  loop” 

keyword  keyword 

colon 

List 

Loop 

variable 

for  letter  in  ['a' 

,  'b\  ' c  '  ]  : 

print ( ' Have  a 

letter:  ' )  Repeated 

print (letter ) 

block 

Indentation 

Using  the 

ucs 

loop  variable 

So,  let's  meet  the  Python  construct  we  will  be  using:  the  “for  loop”. 

It  gets  its  name  from  the  first  keyword  in  the  line:  “for”. 

This  is  followed  by  a  name.  This  should  be  a  new  name,  not  already  in  use.  The  name 
is  going  to  be  used  to  refer  to  the  elements  in  the  list,  one  at  a  time,  as  we  will  see  in  a 
moment.  In  the  case  of  the  slide  we  are  using  the  name  “letter”. 

The  variable's  name  is  followed  by  a  second  keyword  “in”.  This  is  pure  syntactic 
sugar  and  helps  the  line  read  more  like  English. 

After  “in”  comes  the  list  itself.  This  can  be  either  a  literal  list  (as  shown  in  the  slide)  or 
the  name  of  a  list,  or  (as  we  will  see  later)  a  function  whose  returned  value  is  a  list. 

At  the  end  of  the  line  comes  a  colon,  to  introduce  the  indented  section. 

Next  comes  the  set  of  commands  that  are  going  to  be  run  over  each  item  in  the  list  as 
an  indented  block.  There  can  be  as  many  lines  as  you  want;  once  the  indentation  is 
over  the  block  ends. 

Note  that  within  this  block  we  can  use  the  variable  name  created  on  the  first  line, 
“letter”  in  the  case  of  the  slide.  The  block  will  be  run  once  for  each  element  of  the 
list  and  each  time  it  is  run  the  name  will  be  associated  with  a  different  item  in  the  list. 
So,  in  the  case  on  the  slide,  the  block  of  code  will  be  run  three  times.  The  first  time  it 
is  run  the  name  “letter”  will  be  associated  with  the  value  “a”,  the  second  time  with 
“b”,  and  the  third  and  final  time  it  will  be  associated  with  “c”. 


The  “for  loop” 

$  python  forl.py 

Have  a  letter: 
a 

Have  a  letter: 
b 

Have  a  letter: 
c 

Finished ! 


ucs  206 


for  letter  in  [ '  a ' ,  '  b  ' ,  '  c  '  ]  : 
print('Have  a  letter:') 
print(letter) 
print ( ' Finished ! ' ) 

_ _ _ _ 

fori . py 


Let's  look  at  this  in  practice.  The  script  forl.py  in  your  home  directories  implements 
the  code  from  the  previous  slide  together  with  a  final  line  just  to  prove  that  the 
repeated  block  is  finished  with. 


Progress 

The  “for...”  loop 

Processing  each  element  of  a  list 

for  item  in  items  : 

...item... 

ucs 
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Exercise 

Complete  the  script  elementsl .  py 
Print  out  the  name  of  every  element. 

ucs  ©  10  minutes  2o8 


The  file  elementsl .  py  contains  the  Python  for  a  list  of  all  the  element  names. 
Complete  the  script  by  adding  the  loop  to  print  out  the  entries  in  the  list. 


“Slices”  of  a  list 

»>  abc  =  [ ' a' ,  ' b ' ,  'c' 

,  'd\  'e\  'f  ’ ,  'g'] 

»>  abc[l] 

Simple  index 

'  b ' 

Single  element 

»>  abc[l:5] 

Slice  index 

[ '  b ' ,  'c\  'd',  '  e '  ] 

A  new  list  “Slice” 

ucs 
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Let's  go  back  to  extracting  elements  from  lists.  We  have  already  seen  how  to  extract  a 
single  element  from  a  list  by  quoting  its  index  in  square  brackets  after  the  list,  or  the 
list's  name. 

We  can  also  extract  parts  of  lists  as  lists  themselves.  These  are  called  “slices”  of  lists 
are  are  created  with  this  variant  form  of  the  index. 


Slice  limits 


rv\/ 

MUNI  IIIUCA 

“to”  index 

»>  abc[l:5] 

[ '  b ' ,  '  c ' ,  ' d 1 ,  ' e 1  ] 


abc[l] 


abc[4] 


abc[5]  Element  5 
not  in  slice 
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Let's  look  at  the  slice  definition.  It's  two  numbers  separated  by  a  colon. 

The  first  is  the  lower  index.  The  element  that  has  this  index  in  the  original  list  becomes 
the  first  element  in  the  new  list. 

The  second  number  is  the  index  of  the  first  element  of  the  original  list  that  is  not  in  the 
created  list.  As  ever,  Python  has  the  approach  that  the  second  number  is  the  first 
index  that  doesn't  get  included. 


Slice  feature 


Same 

abc[l:3] 

+  abc[3:5] 

['b\  '  c '  ] 

+  [ '  d  ' ,  '  e '  ] 

1 

['b\ 

1 

1  c 1 ,  '  d ' ,  '  e  '  ] 

r 

I 


abc [1 : 5] 
ucs 


The  “off  by  one”  final  index  system  continues  to  cause  some  people  distress.  It  does 
have  one  useful  feature,  though.  If  you  concatenate  two  slices  from  the  same  list  with 
matching  inner  indices  then  those  indices  “cancel  out”  and  you  get  the  slice 
corresponding  to  the  outer  indices. 


Open-ended  slices 

»>  abc  =  ['a', 'b\ 'c', 'd'# 'e', 'f 'g'] 


»> 

abc [3: ] 

Open  ended  at  the  end 

[ 1  d ' 

, '  e ' , 1  f 1 , 1  g 1  ] 

abc[3] 

»> 

abc[ : 5] 

Open  ended  at  the  start 

['a', ' b ' , ' c 1 A  ' d ' , ' e ' ] 

abc[4] 
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We  can  omit  one  of  the  limits. 

If  we  omit  the  second,  upper  limit  then  the  slice  is  the  sub-list  starting  at  the  lower 
index  and  going  to  the  the  end. 

If  we  omit  the  lower  limit  then  the  slice  starts  at  the  beginning.  Note,  again,  the  the 
created  list  stops  one  short  of  the  index  quoted. 


Open-ended  slices 

»>  abc  =  [  'a' ,  ' b' ,  ' c' ,  ' d' ,  ' e' ,  'f',  1  g '  ] 

»>  abc  [ :  ]  Open  ended  at  both  ends 

[ ' a ' ,  'b',  'c', 'd', ' e ' , ' f ' , ' g ’ ] 


ucs 
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This  prompts  the  question:  what  happens  if  we  omit  both  limits? 
We  get  a  copy  of  the  whole  of  the  list. 


Progress 

Slices 

data[m:n] - ►  [  data[m], 

data[m : n] 
data[ : n] 
data[m : ] 
data[ : ] 

ucs 


..  data[n-l] 
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Square  brackets  in  Python 

[-] 

Defining  literal  lists 

numbers [ N ] 

Indexing  into  a  list 

numbers [M:N] 

Slices 
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This  is  really  just  a  variant  on  the  indexing  use  of  square  brackets. 


Modifying  lists  —  recap 

»>  abc  abc[2] 

['a',  '  b ' ,  'c',  1  d 1 ,  ’ e 1 ,  'f 1 g ' ] 

»>  abc [2]  =  'X'  New  value 

»>  abc  Changed 

[’a1, 'b\ 'X', 'd', 'e', 'f ' g ' ] 

ucs 


We  used  simple  index  notation  to  read  an  item  from  a  list. 

Recall  that  we  use  exactly  the  same  notation  to  refer  to  “element  number  two”  in  the 
list  but  this  time  we  place  it  on  the  left  hand  side  of  an  assignment. 


Modifying  vs.  replacing  ? 

»>  xyz  =  [  'x' ,  'y '  ] 

l , 

»>  xyz[0]  =  'A'  >»  xyz  =  ['A'^'B'] 


»>  xyz[l]  = 

L 

'  B ' 

1 

Modifying  the  list 

1 

Replacing  the  list 

»>  xyz 

ucs 

['A\  '  B  '  ] 
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This  prompts  a  question.  Is  there  a  difference  between  changing  a  list  one  item  at  a 
time  (which  we  will  call  “modifying  the  list”)  and  simply  changing  the  whole  list  in  one 
go  (which  we  will  call  “replacing  the  list”)? 

There  is  a  difference  but  it  is  subtle. 


What's  the  difference?  —  la 


»>  xyz  =  [  'x' ,  'y '  ] 
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Let's  start  by  looking  closely  at  the  “modification”  model. 

We  start  by  establishing  a  list,  “xyz”  with  two  items  in  it,  the  single-character  strings 
'  x '  and  '  y ' . 


What's  the  difference?  —  lb 


»>  xyz[0]  =  'A' 


Right  hand  side 
evaluated  first 


xyz 


Hvariables  h 


list 


str  x 


str  y 


str  a 
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Now  we  will  modify  item  number  zero  in  the  list. 

We  will  examine  the  assignment  very  closely. 

The  right  hand  side  of  the  assignment  is  evaluated  first.  So  Python  creates  a  single 
character  strings  'A'  in  memory. 


What's  the  difference?  —  lc 


»>  xyz[0]  =  'A' 


New  value 
assigned 


xyz 


Hvariables  h 


list 


str 

A 

str 

y 

Old  value 
unused  and 
cleaned  up. 


str 


X 
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Next  the  left  hand  side  is  processed.  The  list's  item  number  zero  is  replaced  by  the 
freshly  minted  '  A 1 .  The  previous  value,  1  x ' ,  is  left  behind  and  Python  has  internal, 
automatic  mechanisms  to  delete  it  from  memory  if  it  is  no  longer  refered  to  anywhere. 
The  posh  name  for  this  is  “garbage  collection”.  The  unused  '  x '  is  “garbage”  and  the 
act  of  identifying  it  as  unused  and  deleting  it  to  free  up  Python  memory  is  called 
“collection”. 


What's  the  difference? 


>»  xyz[l]  =  '  B ' 


Repeat  for 
xyz[l]  =  'B' 
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We  do  the  same  thing  for  item  number  one  in  the  list,  giving  it  the  new  value  '  B 1 
We  now  have  the  same  list  object  but  with  both  its  items  changed. 


What's  the  difference?  —  2a 


»>  xyz  =  [  'x' ,  'y '  ] 


xyz 


Hvariables  h 


list 


str 


X 


str  y 
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Now  let's  look  at  the  replacement  scenario. 

Again,  we  start  by  creating  a  two  item  list  called  “xyz”.  Our  starting  point  is  identical. 


What's  the  difference?  —  2b 


»>  xyz  =  [ 1 A ' ,  '  B '  ] 


xyz 


Hvariables  h 


list 


str  x 


str  y 


list 
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Right  hand  side 
evaluated  first 


str 

A 

str 

B 
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Now  we  do  the  assignment., 

We  start,  as  ever,  by  evaluating  the  right  hand  side.  This  triggers  the  creation 
Python  memory  of  a  whole  new  list  containing  'A'  and  '  B 1 . 


What's  the  difference?  —  2c 


»>  xyz  =  [ 1  A' ,  '  B '  ] 


New  value 
assigned 


Old  value 
unused  and 
cleaned  up. 


list 


str 

str 
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Then  Python  processes  the  left  hand  side.  The  name  “xyz”  is  now  assigned  to  this 
new  list  and  the  whole  of  the  old  list  is  unused  (and  garbage  collected). 


What's  the  difference? 

Modification: 

same  list,  different  contents 

Replacement: 

different  list 

? 

• 

Does  it  matter? 
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So  we  get  a  different  list  with  the  second  approach  of  replacement.  Both  cases  give  us 
a  list  with  the  same  content,  though,  so  does  it  really  make  a  difference? 

Here  comes  the  subtlety  I  warned  you  about... 


Two  names  for  the  same  list 

»>  xyz  =  [  'x' ,  'y '  ] 

»>  abc  =  xyz 
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Let's  suppose  we  had  two  names  assigned  to  the  same  list. 


»> 

abc[0]  =  'A' 

Modification 

»> 

abc[l]  =  ' B ' 

Modification 

»> 

xyz 

[  '  A ' 

,  ’  B'  ] 

abc 


Hvariables"! 

ucs 


list 


str  a 


str  b 


It  we  do  the  modifications  one  after  the  other  we  simply  change  the  content  of  the  list 
both  names  point  to  and  so  we  can  change  via  the  name  “xyz”  and  see  the  results  in 
“abc”. 


Now  we  will  look  at  replacement,  starting  with  exactly  the  same  situation. 


»>  abc  =  [ 1  A1 ,  1 B'  ]  Replacement 

»>  xyz 
['x1,  *y 1 ] 


abc 


xyz 

Hvariable'Fl 


list 


str 

A 

str 

B 

list 


str  x 


str  y 


Assigning  “abc”  in  one  go  causes  the  name  to  point  to  the  new  list.  But  now,  instead 
of  the  old  list  being  unused,  and  therefore  garbage  collected,  it  is  still  refered  to  by  the 
name  “xyz”. 


One  last  trick  with  slices 

»>  abc  =  ['a\  'b',  'c',  'd',  'e',  'f '] 

»>  abc[2:4] 

['c', ' d '  ] 

»>  abc[2:4]  =  ['x^y'/z'] 

»>  abc 

Length  6 

['a',  'b',  ' x ' A y ' , 'z', ' e ' , ' f ' ] 

New  length 
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We  have  used  the  simple  index  notation  of  the  left  hand  side  to  modify  individual 
elements  of  a  list.  We  can  also  use  the  slice  notation  on  the  left  hand  side  to  modify 
parts  of  the  list.  We  can  even  change  the  length  of  the  list  in  the  process. 


Progress 

Modifying  lists  values  [N]  =  new_value 

Modification  t  replacement 

values[0]  =  'alpha' 
values[l]  =  'beta' 
values[2]  =  'gamma' 

values  =  ['alpha',  'beta',  'gamma'] 
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Exercise 


1.  Predict  what  these  will  do. 

2.  Then  run  the  commands. 


»>  alpha  =  [0,  1,  2,  3,  4] 
»>  beta  =  alpha 
»>  gamma  =  alpha[:] 

»>  delta  =  beta[:] 

»>  beta[0]  =  5 

»>  alpha 
»>  beta 
»>  gamma 


>»  delta 


© 
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Appending  to  a  list 

»>  abc  =  [  'x' ,  'y '  ] 


»> 

abc 

[  '  X  ' 

,  V 

] 

»> 

abc . 

. append ( ' z 

»> 

abc 

[  '  X  ' 

,  'y' 

,  'z'] 

ucs 

Add  one  element  to 
the  end  of  the  list. 


A  very  common  modification  requirement  is  to  be  able  to  add  something  to  the  end  of 
a  list.  In  fact,  it's  very  common  to  create  lists  by  starting  with  an  empty  list  and  building 
one  item  at  a  time. 

To  do  this  we  have  to  introduce  a  new  element  of  Python  syntax. 

Certain  Python  objects  (for  example,  lists)  have  built-in  functions,  called  “methods”. 
We  see  a  simple  example  of  one  of  these  here. 

The  name  “abc”  is  assigned  to  a  list,  initially  [ '  x ' ,  '  y 1  ] . 

We  then  use  this  new  syntax,  “abc  .  append  ('  z  ') ”,  to  append  '  z  '  to  the  end  of  the 
list. 

So  what's  going  on  here? 


List  “methods” 


abc . append ( ' z  ' ) 


A  list 
A  dot 

A  built-in  function 
Round  brackets 


Argument(s) 

to  the  function 

Built-in  functions:  “methods” 
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The  syntax  for  a  method  —  a  built-in  function  —  is  to  take 

1.  the  object  (or  more  typically  a  name  assigned  to  the  object),  followed  by 

2.  a  dot  to  act  as  the  glue,  followed  by 

3.  the  name  of  the  method,  “append”  in  this  case  followed  by 

4.  round  brackets  to  contain 

5.  the  arguments  passed  into  the  function. 

Note  that  we  don't  seen  to  pass  the  list  itself  in  as  an  argument.  Built-in  functions 
know  where  they  came  from  and  have  access  to  the  object. 


Methods 

Connected  by  a  dot 
object . method (arguments) 


Privileged  access  to  object 

“Object-oriented  programming” 

ucs 


These  “methods”  are  core  to  the  idea  of  “object  oriented  programming”.  While  we 
won't  dwell  on  it  too  much  in  this  course,  there  are  volumes  written  on  this  type  of 
programming. 

The  UCS  offers  a  course  which  may  be  useful  to  take  this  aspect  of  Python 
programming  further: 

Object  Oriented  Programming:  Introduction  using  Python: 
http://training.csx.cam.ac.uk/ucs/course/ucs-oop 


The  append()  method 

»>  abc  =  ['x'j'y'j'z'] 

»>  abc . append ( 'A' ; 

»>  abc. append ( 'B' )  One  element  at  a  time 

»>  abc. append ( 'C' ) 

»>  abc 

['x', 'y ' , 1 z 1 f  'A', ' B ' , ' C ' ] 
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Let's  return  to  the  only  example  we  have  met  so  far:  the  “append  ( )”  method  for  lists. 
It  adds  a  single  element  to  the  list  each  time  it  is  called. 


Beware! 


»>  abc  =  [  'x' ,  'y ' , 
»>  abc .  append  ( [ '  A' 
»>  abc 


ucs 


■z'] 

, 'B'( 'C']) 

Appending 
a  list 

\  ' B' , ' C' ] ] 


Get  a  list  as 

the  last  item 

You  cannot  use  the  append  ( )  method  to  add  multiple  items  by  putting  them  in  a  list. 
All  you  get  is  a  “mixed  list”  that  has  (in  this  case)  three  strings  and  a  list  as  its  four 
elements. 


“Mixed  lists” 


['x',  2,  3.0] 


['alpha',  5,  'beta',  4,  'gamma',  5] 


ucs 
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Mixed  lists,  while  syntactically  legal,  are  almost  always  the  sign  of  confused  thinking. 
Avoid  them.  Stick  to  lists  of  just  one  type. 

Don't  forget  that  a  list  of  lists  of  integers  is  a  perfectly  sound  list.  It's  a  list  of  a  single 
type:  lists  of  integers.  Each  of  its  elements  is  also  a  perfectly  sound  list:  a  list  of 
integers. 


The  extend()  method 

»>  abc  =  ['x'j'y'f'z'] 


All  in  one  go 

»>  abc . extend ( [ 1 A 1 ,  '  B 1 ,  1 C '  ] ) 


Utterly  unnecessary! 


»>  abc 


[ '  x ' ,  '  y ' ,  'z',  'A',  'B\  '  C '  ] 
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So  how  do  we  add  a  list  to  the  end  of  a  list? 

We  can  use  the  extend  ( )  method  which  takes  a  list  of  elements  as  its  argument  and 
adds  them  individually  to  the  end. 

But  there  is  no  need  to  ever  use  this! 


Remember  that  lists  can  be  concatenated  with  the  “+”  sign.  So  we  have  this  much 
simpler  syntax  to  do  it. 


Changing  the  list  “in  place” 


»>  abc .  append  ( '  w ' ) 

»>  abc 

r  I  w  I  i,,i  i  7  i  I  1  . 1  n 

L  x  ,  y  ,  z  ,  w  J 

»>  abc .  extend ( [ '  A ' ,  '  B '  ] ) 

»>  abc 

r  i  v  i  i\/i  '-71  'ii1  'a1  i  d  n  List  itself  is 

[x,y,z,w,A,B]  changed 
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No  value  returned 

List  itself  is 
changed 

No  value  returned 
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There's  something  worth  noticing  about  both  the  append  ( )  method  and  the 
extend  ( )  method.  Both  of  them  modify  the  list  they  are  a  method  of.  They  don't 
return  a  new  modified  list,  they  silently  modify  the  list  itself. 


Another  list  method:  sort() 


»> 

abc  =  [ '  z  ' , 

'  x ' ,  '  y '  ] 

New  method 

»> 

abc.sort( ) 

No  arguments 

»> 

abc 

[  '  X  ' 

, 'y', 'z'] 

ucs 

Let's  look  at  a  couple  more  methods  like  this. 

We  start  with  sort  ( )  which  takes  causes  the  list  it  is  attached  to  to  become  sorted  in 
place. 

A  list  can  be  of  any  type  of  item  which  supports  etc.  and  it  will  happily  sort. 


Any  type  of  sortable  element 

»>  abc  =  [3,  1,  2] 

»>  abc. sort () 

»>  abc 

[1/  2,  3] 

>»  abc  =  [3.142,  1.0,  2.718] 

»>  abc. sort () 

»>  abc 

[1.0,  2.718,  3.142] 
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Any  sort  of  list  where  “<”  etc.  makes  sense  can  be  sorted. 


Another  list  method:  insertQ 

0  12 

3 

»>  abc  =  ['w'j'x'j'y'. 

'z'] 

»>  abc . insert (2, 'A' ) 

Insert  just  before 

element  number  2 

»>  abc 

r  i,  ,i  i  v  i  i  a  i  iwi  i  i  n 

L  w  ,  x  ,  A  ,  y  ,  z  ] 
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“old  2” 

Here's  another.  The  append  ( )  method  adds  an  item  to  the  end  of  its  list.  How  do  we 
add  items  elsewhere? 

The  insert  ( )  method  takes  two  arguments.  The  first  is  the  index  before  which  the 
item  is  to  be  inserted  and  the  second  is  the  item  itself. 


Progress 

List  methods:  Change  the  list  itself 

Don't  return  any  result 

list .  append(ite/7?) 

list .  extend  ( [itemif  item2,  item3]) 

list . sort ( ) 

list . insert (index,  item) 

ucs 


We  have  met  four  list  “methods”  which  modify  the  list  itself  but  don't  return  any  result. 


FyPTPIQP  !■  Predict  what  this  will  do. 

dXtJIOIot?  2.  Then  run  the  commands. 

data  =  [] 
data. append(8) 
data. extend ( [6,  3,  9] ) 
data. sort ( ) 
data. append(l) 
data. insert (3,  2) 
data 

ucs 
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Creating  new  lists 

»>  numbers  =  [0, 1, 2, 3, 4, 5, 6, 

»>  copy  =  [] 

»>  for  number  in  numbers: 

7,8,9] 

copy . append ( number ) 

Simple 

■  ■  ■ 

»>  copy 

[0,1, 2, 3, 4, 5, 6, 7, 8, 9] 

copying 

ucs 
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Copying  items  across  one  at  a  time  with  a  for...  loop  is  typically  overkill.  However,  if 
you  want  to  change  that  number  as  it  gets  copied  across  then  it's  quite  a  sensible 
approach. 

This  is  an  example  of  straightforward  copying. 


Creating  new  lists 

Boring! 

»>  numbers  =  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] 
»>  squares  =  [] 

»>  for  number  in  numbers: 

squares . append (number* *2) 


■  ■  ■ 


■  ■  ■ 


>»  squares 

[0,1,4,9,16,25,36,49,64,81] 

ucs 


Changing 
the  value 


248 


And  here's  an  example  of  changing  the  number  as  it  goes  across.  In  this  case  we 
square  it. 

Note  that  we  are  using  a  literal  list  of  the  numbers  from  0  to  9.  There  must  be  a  better 
way  to  do  it  than  that! 


Lists  of  numbers 

»>  numbers  =  range(0,10) 

»>  numbers 

[0,1,  2,  3,  4,  5,  6, 7,  8,  9] 

range(0, 10) 

[0, 1,  2,  3,  4,  5,  6,  7,  8,  9]  c.f.  numbers[0:5] 
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There  is! 

Python  has  built  into  it  a  function  called  range  ( )  which  generates  lists  of  whole 
numbers.  As  ever,  it  starts  at  the  first  argument  and  ends  one  short  of  the  second 
argument,  (c.f.  slices) 


Creating  new  lists 

Better! 

»>  numbers  = 

»>  squares  = 

»>  for  number  in  numbers: 

squares . append (number* *2) 

■  ■  ■ 

>»  squares 

[0, 1,  4,  9, 16,  25, 36, 49, 64, 81] 

ucs 


range(0, 10) 
[] 


This  makes  our  instructions  a  little  more  sensible.  More  importantly,  it  makes  them 
more  flexible.  I  can  adapt  this  program  to  run  up  to  99  rather  than  9  with  a  simple  edit 
of  just  one  number. 


method 


Lists  of  words 

string 

»>  'the  cat  sat  on  the  mat'. split () 

[ ' the ' , ' cat ' , ' sat ' , '  on ' , ' the ' ,  ' mat ' ] 

»>  'The  cat  sat  on  the  mat split ( ) 

[ ' The ' , ' cat ' , ' sat ' , ' on ' , ' the ' ,  ' mat . ' ] 

No  special  handling 
for  punctuation. 
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There  are  other  ways  to  get  lists.  A  method  that's  often  useful  when  processing  lines 
of  data  is  the  split  ( )  method  on  strings.  This  returns  a  list  of  “words”  which  are  the 
components  of  the  string  separated  by  spaces.  It  is  a  very  primitive  mechanism;  there 
are  better  but  more  complex  methods  elsewhere  in  Python.  For  example,  it  only  splits 
on  spaces,  not  other  punctuation. 


Progress 

Ways  to  build  lists: 

data[ :  ]  slices 

for  loops  appending  elements 

range(m,n)  function 

split  ( )  string  method 

ucs 
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Exercise 

Write  a  script  from  scratch:  transform .  py 

1.  Run  a  variable  n  from  0  to  10  inclusive. 

2.  Create  a  list  with  the  corresponding  values  of 
n2  +  n  +  41. 

3.  Print  the  list. 
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Here  are  some  hints  to  help  you  with  the  exercise: 

1.  Run  a  variable  from  0  to  10  inclusive. 

To  run  a  variable  through  a  list  you  will  need  to  use  a  for...  loop. 

To  get  the  values  being  a  list  of  numbers  from  0  to  10  inclusive  you  can  either  use  a 
literal  list  [0,  1,  2,  3,  4,  5,  6,  7,  8,  9,  10]  or  you  can  use  the 
range  (from,  to)  function.  Recall  the  strange  behaviour  about  the  upper  limit  of  the 
list  produced. 

2.  Create  a  list... 

In  this  case,  when  you  are  building  an  “outputs”  list  from  an  “inputs”  list  (0-10),  your 
best  bet  is  to  start  with  an  empty  outputs  list  before  the  for...  loop  starts  and  to  add  an 
output  to  it  for  each  run  of  the  for...  loop. 


Brief 

diversion 


ucs 


I  want  to  take  a  quick  diversion  to  discuss  something  that  may  be  coming  to  mind  but 
which  we  are  not  going  to  handle  yet. 

Image  (c)  FreeFoto.com:  licensed  under  Creative  Commons  Attribution- 
Noncommercial-No  Derivative  Works  3.0  Licence. 


Arrays  as  lists  of  lists 


0.0 

-1.0 

-4.0 

-1.0 

0.0 

1.0 

0.0 

-1.0 

0.0 

1.0 

4.0 

1.0 

0.0 

1.0 

4.0 

1.0 

0.0 

-1.0 

0.0 

1.0 

0.0 

-1.0 

-4.0 

-1.0 

0.0 

[0.0, 

1 

© 

1 

4^ 

© 

-1.0, 

0.0] 

! 

[1.0, 

0.0, 

-1.0, 

0.0, 

1.0] 

! 

[4.0, 

1.0, 

0.0, 

1.0, 

4.0] 

! 

[1.0, 

0.0, 

-1.0, 

0.0, 

1.0] 

! 

[0.0, 

-1.0, 

-4.0, 

-1.0, 

0.0] 

] 
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Scientists  deal  in  arrays  of  data,  not  just  linear  lists.  Two-,  three-  and  four-dimensional 
arrays  are  common. 

Plain  Python  can  handle  multi-dimensional  data,  but  its  facilities  are  limited.  Python 
would  represent  a  two-dimensional  array  as  a  list  of  lists.  The  “outer  list”  would  have 
one  row  per  item.  Each  item  would  be  a  list  of  the  elements  in  that  row. 


Indexing  from  zero 


0.0 

-1.0 

-4.0 

-1.0 

0.0 

1.0 

0.0 

-1.0 

0.0 

1.0 

4.0 

1.0 

0.0 

1.0 

4.0 

1.0 

0.0 

-1.0 

0.0 

1.0 

0.0 

-1.0 

-4.0 

-1.0 

0.0 

[0.0, 

1 

© 

1 

4^ 

© 

-1.0, 

0.0] 

! 

[1.0, 

0.0, 

-1.0, 

0.0, 

1.0] 

1 

[4.0, 

1.0, 

0.0, 

1.0, 

a  m 

— r  .  j 

1 

[1.0, 

0.0, 

-1.0, 

0.0, 

I.©] 

! 

[0.0, 

-1.0, 

-4.0, 

-1.0, 

0.0] 

] 
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And  don't  forget  that  Python  indexes  from  zero. 


a[2][3] 

256 


Referring  to  a  row  —  easy 


0.0 

-1.0 

-4.0 

-1.0 

0.0 

1.0 

0.0 

-1.0 

0.0 

1.0 

4.0 

1.0 

0.0 

1.0 

4.0 

1.0 

0.0 

-1.0 

0.0 

1.0 

0.0 

-1.0 

-4.0 

-1.0 

0.0 

[0.0,  -1.0,  -4.0,  -1.0,  0.0]  , 


[0.0,  -1.0,  -4.0,  -1.0,  0.0] 
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[1.0, 

0.0, 

-1.0, 

0.0, 

1.0] 

[4.0, 

1.0, 

0.0, 

1.0, 

4.0] 

[1.0, 

0.0, 

-1.0, 

0.0, 

1.0] 

a[2] 
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Referring  to  a  single  row  as  a  “thing”  is  easy... 


Referring  to  a  column 


0.0 

-1.0 

-4.0 

-1.0 

0.0 

1.0 

0.0 

-1.0 

0.0 

1.0 

1.0 

1.0 

0.0 

1.0 

4.0 

1.0 

0.0 

-1.0 

0.0 

1.0 

0.0 

-1.0 

-4.0 

-1.0 

0.0 

No  Python 
construct! 


[  [0.0, 

-1.0, 

-4.0, 

-1.0, 

0.0] 

! 

[1.0, 

0.0, 

-1.0, 

0.0, 

I.©] 

! 

[4.0, 

1.0, 

0.0, 

1.0, 

4.0] 

! 

[1.0, 

0.0, 

-1.0, 

0.0, 

1.0] 

! 

[0.0, 

-1.0, 

-4.0, 

-1.0, 

0.0] 

] 

ucs 
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.but  there  is  no  way  to  refer  to  a  column  with  simple  syntax. 


Numerical  Python? 

Hold  tight! 

Later  in  this  course,  “numpy” 

powerful  support  for: 

numerical  arrays 
matrices 

ucs 


But  all  is  not  lost! 

Later  in  this  course  we  will  refer  to  a  set  of  Python  functions  and  objects  known 
“numerical  Python”  or  “numpy”  for  short.  This  will  solve  all  our  problems. 

Be  patient. 


We  now  return  you  to  your  normally  scheduled  course. 

Image  (c)  Flickr  user  illustir,  released  under  a  Creative  Commons  licence  v2.0. 
http://www.flickr.eom/photos/alper/3257406961/sizes/o/in/photostream/ 


Files 


data 

file  #1 

input 

data 

file  #2 

input 

data 

file  #3 

-P 
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Let's  put  lists  behind  us  now  and  move  on  to  look  at  something  else. 

At  the  moment  all  our  Python  scripts  have  been  self-contained.  All  the  data  they  act  on 
is  wired  into  them.  We  want  to  move  away  from  that  and  have  them  interact  directly 
with  the  system.  The  first  example  of  that  will  be  interacting  with  files. 

So,  we  want  our  scripts  to  be  able  to  read  data  in  from  multiple  files  and  write  results 
out  to  multiple  files. 


Reading  a  file 

1.  Opening  a  file 

2.  Reading  from  the  file 

3.  Closing  the  file 

ucs 
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We  will  start  by  reading  a  file.  The  procedure  for  this  comes  in  three  distinct  phases. 

First  we  will  get  our  hooks  into  the  file  we  want  to  read  from.  This  is  the  transformation 
from  a  name  of  the  file  to  a  Python  object  that  represents  the  file.  This  called  “opening 
the  file”. 

Second  we  will  use  that  newly  minted  Python  object  to  read  the  data  from  the  file. 
Third  we  will  dispose  of  the  Python  object  corresponding  to  the  file  to  alert  the  system 
that  we  no  longer  need  access  to  it.  This  is  called  “closing  the  file”. 


Let's  start  with  opening  the  file.  This  is  conceptually  the  most  complicated  part  of  the 
whole  process. 

We  start  with  the  name  of  the  file.  This  is  just  a  string.  In  our  case  we  have  the  name 
of  a  file  “data .  txt”.  We  need  to  convert  that  string  into  a  Python  object  that  will  let  us 
access  the  file.  The  Python  object,  internally,  will  need  to  know  what  file  it  corresponds 
to  and  how  far  into  the  file  we  have  read.  On  initial  creation,  of  course,  this  position  in 
the  file  (known  as  the  “offset”)  will  point  to  the  very  start  of  the  file. 

This  mapping  from  file  name  to  the  file  object  itself  is  handled  by  a  Python  function 
called  “open( )”. 


Python 

file  name 

string 

command 

>»  data  =  open  ( '  data .  txt '  ) 

Python  file 
file  object 

refers  to  the  file  with  name  'data.txt' 

initial  position  at  start  of  data 
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So,  what's  the  Python  syntax? 

The  open( )  function  takes  the  file  name  as  its  argument  and  returns  the  Python  file 
object. 


How  can  we  use  this  Python  object?  How  do  we  read  data  from  the  file? 


>»  data=  open  ('data. 

txt '  ) 

the  Python  file  object 
a  dot 

a  “method” 

»>  data .  readline( ) 

'line  one\n' 

first  line  of  the  file 

complete  with  “\n” 

»>  data. readline ( ) 

same  command  again 

'line  two\n' 

second  line  of  file 
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We  can  read  the  file's  content  one  line  at  a  time. 

The  file  object,  data,  has  a  method  readline  ( )  which  reads  one  line  from  the  file 
and  returns  it  as  a  string. 

Note  that  the  “end  of  line”  marker  is  returned  as  part  of  the  line  (at  the  end,  obviously). 
Also  note  that  if  we  call  the  readline  ( )  method  a  second  time  we  get  the  second 
line  of  the  file. 


What's  happening  is  this:  Immediately  after  creating  the  file  object  with  the  open( ) 
function  its  position  pointer  points  to  the  very  start  of  the  line. 


>»  data  =  open (' data,  txt ' ) 

»>  data. readline ( ) 

'line  one\n ' 

position: 

after  end  of  first  line  / 

at  start  of  second  line 


line  one\n 
line  two\n 
line  three\n 
line  four\n 


ucs 
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When  we  call  the  file  object's  readline  ( )  method  we  read  out  the  first  line  (including 
the  new  line  marker)  and  the  position  indicator  is  changed  to  point  to  the  start  of  the 
second  line. 


>»  data  =  open  ('data 

»>  data. readline ( ) 

'line  one\n ' 

»>  data. readline ( ) 

'line  two\n ' 

after  end  of  second  line 
at  start  of  third  line 


. txt ' ) 


A 


line  one\n 
line  two\n 
line  three\n 
line  four\n 
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When  we  call  data,  readline ( )  again  the  pointer  is  moved  forwards  a  second  time, 
this  time  to  the  start  of  the  third  line. 

Each  time  we  call  data .  readline  ( ) ,  the  reading  starts  at  the  current  position,  runs 
to  just  after  the  next  end  of  line  character  and  is  left  ready  for  the  next  lot  of  reading. 


>»  data  =  open (' data. txt ' 

»>  data. readline ( ) 

'line  one\n ' 

»>  data. readline ( ) 

'line  two\n ' 

»>  data. readlines ( ) 

[ ' line  three\n ' , 

'line  four\n'  ] 

end  of  file 
ucs 


line  one\n 
line  two\n 
line  three\n 

line  four\n 

/ 

_ y 


We  can  read  the  entire  rest  of  the  file  in  one  go  with  the  readlines  ( )  method 
(n.b.  the  terminal  “s”).  This  returns  a  list  of  all  the  lines.  In  practice  we  won't  do  this  as 
we  will  meet  better  methods  to  do  it  later. 


»>  data. readline ( ) 

'line  two\n ' 

»>  data. readlines( ) 

[ ' line  three\n ' , 

'line  four\n'  ] 

>»  data.close() 

disconnect 
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Once  we  have  read  all  the  data  we  want  from  the  file  (not  necessarily  all  of  it)  we 
should  close  the  file  to  tell  the  system  we  no  longer  need  it.  This  is  done  with  the 
method  close  ( ). 


Common  trick 


for  line  in  data. readlines( ) : 
stuff 


i 

for  line  in  data: 
stuff 


Python  “magic”: 
treat  the  file  like 
a  list  and  it  will 
behave  like  a  list 
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I  mentioned  earlier  that  Python  has  a  couple  of  tricks  so  that  you  would  never  need  to 
run  the  readlines  ( )  method  directly.  This  is  the  first  of  them. 

The  most  common  reason  for  wanting  a  list  of  the  lines  in  a  file  is  so  that  you  can  step 
through  them  one  at  a  time  in  a  for...  loop.  Python's  trick  is  that  for  many  type  of 
object  where  there  is  an  obvious  “list  view”  of  that  object,  you  can  simply  drop  the 
object  in  to  a  situation  where  a  list  would  be  expected. 

For  example,  if  we  drop  a  file  into  the  list  slot  in  a  for...  loop,  it  behaves  like  this  list  of 
lines.  The  two  blocks  of  Python  code  behave  in  exactly  the  same  way. 


Simple  example  script 

count  =  0 

1.  Open  the  file 

data  =  open( ' data . txt ' ) 

2.  Read  the  file 

for  line  in  data: 

One  line  at  a  time 

count  =  count  +  1 

data. close( ) 

3.  Close  the  file 

print (count) 
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So  let's  see  a  real  example.  This  is  a  primitive  “count  the  lines”  script.  All  it  does  is  to 
count  the  lines  in  a  file. 

It  starts  by  setting  the  counter  to  zero,  as  no  lines  have  been  read  yet. 

The  first  file  operation  is  that  it  opens  the  file.  This  returns  a  Python  file  object.  The 
author's  habit  is  to  name  them  after  the  file  name  if  the  file  name  is  embedded  in  the 
script  but  it  is  quite  arbitrary.  Instead  of  “data”  it  could  have  been  called  “input”, 

“f ile_whose_lines_are_to_be_counted”,  or  “f  red”. 

The  second  file  operation  is  to  read  the  lines  from  the  file,  one  at  a  time.  We  do  this 
with  a  for...  loop.  This  is  the  classic  way  to  read  in  a  text  file  in  Python. 

Within  the  block  of  the  for...  loop  we  act  on  each  line  as  it  comes  up.  In  our  case  we 
ignore  what's  in  the  line  itself,  but  just  increment  the  counter. 

The  third  file  operation  is  to  close  the  file  after  we  have  finished  with  it. 

Finally  we  print  out  the  number  of  lines. 


Progress 

filename - ►open( ) - ►  readable  file  object 

data  =  open( ' input . dat ' ) 

data. readline( ) 

for  line  in  data: 

. . . line . . . 
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Exercise 


Write  a  script  treasure. py 
from  scratch  to  do  this: 


Open  the  file  treasure.txt. 

Set  three  counters  equal  to  zero: 

njines,  n_words,  n_chars 
Read  the  file  line  by  line. 

For  each  line: 

increase  njines  by  1 
increase  n_chars  by  the  length  of  the  line 
split  the  line  into  a  list  of  words 
increase  n_words  by  the  length  of  the  list 
Close  the  file. 

Print  the  three  counters.  _ 
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Here's  an  exercise.  It's  a  serious  one  that  should  take  some  time.  (The  file 
treasure  .  txt  contains  the  entire  text  of  “Treasure  Island”  by  Robert  Louis 
Stevenson  and  is  provided  courtesy  of  Project  Gutenberg.) 

Some  hints: 

1.  Open  the  file. 

file  =  open  (filename) 

2.  Set  three  counters  equal  to  zero, 
njines  =  0  etc. 

3.  Read  the  file  line  by  line.  Recall  the  Python  idiom  that  if  you  treat  a  file  like  a  list 
it  will  behave  like  a  list  of  the  lines  in  that  file: 

for  line  in  file : 

4.  Increase  n_lines  by  one.  Please  tell  me  you  don't  need  this  hint. 
n_lines  =  n_lines  +  1 

5.  Increase  n_chars  by  the  length  of  the  line. 

Recall  that  len  (string)  gives  the  length  of  the  string. 

6.  Split  the  line  into  a  list  of  words.  Recall  the  split  ( )  method  on  strings. 
words  =  line. split () 

7.  Increase  n_words  by  the  length  of  the  list  of  words 

Recall  that  len  (list)  gives  the  length  of  the  list,  i.e.  the  number  of  items  in  it. 

8.  Close  the  file.  Recall  the  close  ( )  method  on  file  objects. 

9.  Print  the  three  counters.  This  will  do  for  this  exercise: 
print (n_lines,  n_words,  n_chars) 


Converting  the  type  of  input 


Problem: 


I. 0 
2.0 

3.0 

4.0 

5.0 

6.0 

7.0 

8.0 

9.0 

10.0 

II. 0 

numbers . dat 
ucs 


[  1 1  ■ 

0\n ' , 

'2. 

0\n ' 

'3. 

0\n ' , 

'4. 

0\n ' 

'5. 

0\n ' , 

'6. 

0\n ' 

'7. 

0\n ' , 

'8. 

0\n ' 

'9. 

0\n ' , 

'10 

i .  0\n 

'll 

. . 0\n ' 

] 

List  of  strings,  not 
a  list  of  numbers. 


! 

! 

! 

! 

I 
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Now  let's  suppose  we  want  to  do  something  with  the  content  of  the  lines,  as  opposed 
to  just  counting  the  lines.  We  immediately  hit  a  problem  that  reading  from  files  always 
delivers  strings.  We  can't  do  arithmetic  with  strings  so  we  need  some  way  to  get  from, 
say,  the  string  '1.0'  to  the  floating  point  number  1 . 0. 


Type  conversions 


»> 

float ( '1.0\n' ) 

String  - 

Float 

1.0 

»> 

str(l.O) 

Float  - 

String 

'1.0 

l 

No  newline 

»> 

float (1) 

Int  - 

Float 

1.0 

»> 

int( -1.5) 

Float  - 

Int 

-1 

ucs 

Rounding  to  zero 
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Python  has  a  set  of  functions  for  converting  types.  Each  of  these  is  named  after  the 
type  it  converts  into  and  takes  whatever  it  can  as  an  argument. 


Type  conversions  to  lists 

»>  list  (' hello ' )  String  -  List 

[ ' h ' ,  '  e ' ,  '1',  '1',  'o'] 

»>  data  =  open( ' data,  txt ' ) 

»>  list  (data)  File  -  List 

['line  one\n',  'line  two\n ' , 

'line  three\n',  'line  four\n'] 
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Recall  that  lists  are  valid  Python  types.  Therefore  there  is  a  list( )  function  that 
attempts  to  convert  its  argument  into  a  list. 

Strings  are  converted  into  lists  of  characters.  File  objects  are  converted  into  lists  of 
lines. 


Example  script 

sum  =  0.0 

data  =  open( ' numbers . dat ' ) 

for  line  in  data: 

sum  =  sum  +  float(line) 

data. close( ) 

print  sum 
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Let's  see  the  conversions  in  practice.  Given  our  data,  txt  file  with  one  floating  point 
number  per  line,  this  script  adds  up  all  the  values. 

It  reads  each  line  of  the  file  exactly  as  it  did  before,  but  this  time  it  takes  the  string  of 
the  line  and  converts  it  into  a  float  before  doing  arithmetic  with  it. 


To  date  we  have  just  been  reading  from  files.  Now  we  want  to  write  to  them  too.  We 
will  open  a  file  again,  but  this  time  we  will  need  to  declare  that  we  are  opening  it  for 
writing. 


Writing  to  a  file 


output 


output 


open('output.txt'  )  Default 
Equivalent 

open( '  output .  txt r  ' )  Open  for 

reading 


output 


open( ' output . txt 


Open  for 
writing 


ucs 


Let's  start  by  looking  at  the  open( )  function  we  know  already.  This  takes  a  file  name 
and  opens  a  file  for  reading.  Actually,  it  can  open  a  file  for  reading  or  writing,  and  the 
behaviour  is  governed  by  a  second  argument  saying  which.  If  this  second  argument  is 
omitted  then  the  file  is  opened  for  reading.  If  we  want  to  explicitly  include  that  second 
argument  then  the  way  to  declare  that  the  file  is  to  be  opened  for  reading  is  to  set  the 
argument  to  the  letter  '  r  ' . 

If  we  want  the  file  opened  for  writing  then  we  set  that  second  argument  to  be  the 
letter  '  w' . 


Opening  a  file  for  writing 

'output.txt' 

open( 'output . txt ' , ' w' ) 


A 


Start 
of  file 


Empty 

file 
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The  open()  function  returns  a  file  object  as  ever,  with  a  pointer  set  to  the  beginning  of 
the  file.  Note  that  if  the  file  already  exists  the  setting  of  a  write  pointer  to  the  start  of 
the  file  effectively  truncates  the  file  to  being  zero  bytes  long. 


»>  output  =  open(  'output .  txt  ','w' ) 

file  name  open  for 
writing 


ucs 


So,  how  do  we  write  to  a  file? 

We  start  by  opening  it  for  writing. 


»>  output  =  open(  'output .  txt  ','w' ) 
»>  output  .write( '  alpha\n ' ) 

Method  to  Lump  of 
write  a  lump  data 
of  data 


“Lump”:  need 
not  be  a  line. 


Current 

position 

changed 
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Then  we  use  the  write  ( )  method  in  the  file  object.  The  read( )  method  takes  no 
argument  and  returns  a  line.  The  write  ( )  method  takes  a  line  (actually  an  arbitrary 
lump  of  data)  and  returns  nothing. 

The  current  position  marker  (the  offset)  is  moved  to  the  end  of  the  file. 


»>  output  =  open(  'output .  txt  ','w' ) 
»>  output .  write( '  alpha\n ' ) 

»>  output .  write( '  bet '  ) 

Lump 
of  data  to 
be  written 
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The  write ( )  method  does  not  need  to  be  passed  lines. 


»>  output  =  open(  'output .  txt  ','w' ) 
»>  output .  write( '  alpha\n ' ) 

»>  output .  write( '  bet ' ) 

»>  output .  write( '  a\n ' ) 

Remainder 
of  the  line 
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»>  output  =  open(  'output .  txt  ','w' ) 

»>  output .  write( '  alpha\n ' ) 

»>  output .  write( '  bet ' ) 

»>  output .  write( '  a\n ' ) 

»>  output . writelines ( [ ' gamma\n ' , 

' delta\n ' ] ) 

Method  to  write 
a  list  of  lumps 

ucs 


Just  as  there  is  a  readlines  ( )  method  there  is  a  writelines  ( )  one  too. 


»> 

output  =  open('output 

.txt'/w') 

»> 

output . write( ' alpha\n 

') 

»> 

output . write( ' a\n ' ) 

>» 

output . writelines ( [ ' gamma\n ' , 

' delta\n ' ] 

>» 

output . close( ) 

Data  may  not  be 
written  to  disc 
until  close()! 

Python  is  done 
with  this  file. 

ucs 
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Once  we  have  written  what  we  want  we  must  close  the  file  with  the  close  ( )  method 
We  could  get  away  without  closing  the  file  for  reading.  We  must  close  it  after  writing. 
There  is  no  guarantee  that  the  data  will  actually  make  it  to  disc  until  the  file  is  closed. 


A  Only  on  close()  is  it 

guaranteed  that  the 
data  is  on  the  disc! 


>»  output .  close( ) 


ucs 


We  repeat:  You  must  always  close  a  file  after  writing. 


Progress 

filename - ►open( )  — 

-►  writable  file  object 

data  =  open( ' input . dat ' ,  'w') 


data. write(line) 

line  must  include  \n 

data. close( ) 

“flushes”  to  disc 

ucs 

290 

Example 

output  =  open( ' output . txt ' ,  'w ') 
output .write( ' Hello,  world ! \n ' ) 
output . close( ) 


ucs 


Rather  than  bore  you  with  a  trivial  exercise  we'll  give  a  very  quick  example  here.  This 
three  line  script  is  a  complete  “write  to  a  file”  Python  script. 


Example  of  a  “filter” 

Reads  one  file,  writes  another. 


ucs 


input 

data 

file 

_ r 


To  do  something  useful  with  files  we  need  to  read  and  write  data  at  the  same  time. 
The  classic  example  of  this  is  a  “filter”  which  reads  in  one  file  and  writes  out  another 
based  on  the  input's  contents  a  line  at  a  time. 

We  will  write  very  simple  filters  right  now  and  look  at  more  complex  ones  later  inth  e 
course  when  we've  learnt  a  few  more  tricks. 


Example  of  a  “filter” 

input  =  open( ' input . dat ' ,  ' r ' ) 
output  =  open (' output . dat ' ,  'w') 
line_number  =  0 


Setup 


for  line  in  input: 

line_number  =  line_number  +  1 

words  =  line.split() 

output .write( ' Line  ') 

output . write ( st r(line_n umber) ) 

output .write( '  has  ') 

output . write ( st r(len (words) ) ) 

output .write( '  words. \n') 


Ugly! 


input . close( ) 
output . close( ) 
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filterl . py 


Shutdown 
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Here's  a  straightforward  example. 

The  setup  opens  the  two  files.  It's  a  matter  of  personal  choice  whether  or  not  you  are 
explicit  about  the  read-only  open  of  the  input.  The  author  thinks  it  helps  to  contrast  the 
two  operations. 

Note  the  explicit  close  operations.  Get  into  this  habit  even  if,  in  this  particular  case, 
they  would  have  been  closed  by  the  script  terminating. 

Note  how  ugly  the  output  writing  lines  are.  We  can  do  much  better  and  will  see  how  to 
later  in  the  course. 


Exercise 


Change  treasure. py 
to  do  this: 


Read  treasure .  txt  and  write  treasure .  out. 
For  each  line  write  to  the  output: 
line  number 

number  of  words  on  the  line 
number  of  characters  in  the  line 
separated  by  TABs. 

At  the  end  output  a  summary  line 
number  of  lines 
total  number  of  words 
total  number  of  characters 


separated  by  TABs  too. 
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Hints: 

1.  Start  with  data .  txt  before  trying  your  script  out  on  the  full  text  of  Treasure 
Island. 

2.  If  the  line  number  is  in  njines,  the  line  is  called  line,  and  the  list  of  words  is 
called  words,  then  the  string  to  output  each  line  is  this: 

str(n_lines)  +  ' \t '  +  str (len(words) )  +  ' \t '  +  str (len(line) ) 
'  \n ' 

3.  Work  with  print  ( )  to  get  the  output  right  and  then  change  to 
output .write( ). 


Problem 
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Let's  look  at  a  common  problem  in  writing  a  script  (in  any  language).  We  run  a  for... 
loop  using  a  variable  “n”.  Doing  this  will  overwrite  any  previous  definition  of  n  we  had 
elsewhere  in  the  script.  If  the  script  is  short  then  this  isn't  really  a  problem.  However, 
as  the  script  gets  longer  (and  they  rarely  seem  to  get  shorter!)  it  becomes  an 
increasing  risk. 


Solution  in  principle 
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The  solution  would  be  to  somehow  isolate  the  variable  name  “n”  within  the  for...  loop 
from  any  use  of  the  same  name  outside. 


Solution  in  principle 


The  names 
used  inside 
never  get  out 


ucs 
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The  isolation  of  the  for...  loop  can't  be  absolute,  obviously.  We  want  to  get  the  limit 
(11  in  this  case)  in  and  the  results  out.  But  we  don't  really  care  what  they  are  called 
inside  the  for...  loop.  We  want  to  pass  the  value  11  in  and  get  the  value  of  the  list 
out. 


Solution  in  practice 

output 

function 

input 

results  =  my_f unction  (  11  ) 


Need  to  be 
able  to  define 
our  own 
functions! 
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We  implement  this  by  building  our  own  function.  We  will  pass  in  the  value  we  want  as 
an  argument  and  read  out  the  value  we  get  as  a  result.  We  can  then  assign  this  to 
whatever  variable  name  we  want. 

(We  can  also  use  whatever  variable  name  we  want  as  the  input  argument  too  instead 
of  a  literal  value.) 


Defining  our  function 


define 

* 

function  name 

input 

colon 

def  my_function (limit ) : 

indentation 
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So  let's  define  a  function. 

We  start  with  the  new  Python  keyword  “def”  which  starts  the  definition  of  a  function. 
This  is  followed  by  the  name  of  the  function. 

Then  comes  the  indicator  for  the  function's  arguments.  In  this  introductory  course  we 
won't  worry  about  optional  arguments  and  just  do  functions  with  a  fixed  number  of 
arguments.  We  list  the  arguments  giving  them  the  names  that  will  be  used  in  the 
definition  of  the  function.  These  names  have  nothing  to  do  with  any  name  that  may 
appear  outside  the  function  definition.  We  have  our  isolation. 

The  line  ends  with  a  colon  and  what  follows,  the  definition  of  the  function,  is  indented. 


Defining  our  function 

Names  are  used 
only  in  the  function 


def  my_function( limit ) : 
answer  =  [] 

for  n  in  range(0,  limit): 

answer . append(n**2  +  n  +  41) 

Function  definition 


So  we  follow  with  our  function  “body”.  This  indented  block  carries  the  actual  working  of 
the  function.  Note  that  any  variable  names  created  within  the  function  (including  the 
one  for  the  argument)  are  purely  internal.  If  there  are  variables  called  limit,  answer 
and  n  elsewhere  in  the  script  they  are  not  touched  by  this  function. 


Defining  our  function 

Pass  back... 
...this  value 


def  my_function( limit ) : 
answer  =  [] 

for  n  in  range(0,  limit): 

answer . append(n**2  +  n  +  41) 
return  answer 
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We  still  have  to  spit  out  the  function's  calculated  value.  This  is  done  with  the  new 
Python  keyword  “return”  which  can  only  be  used  in  a  function.  This  returns  the  value 
corresponding  to  the  name  answer. 

This  ends  our  definition  of  the  function  so  we  cease  the  indentation. 


Using  our  function 

def  my_function(limit ) : 
answer  =  [] 

for  n  in  range(0,  limit): 

answer . append(n**2  +  n  +  41) 
return  answer 


results 

=  my_function( 11) 

“answer” 

t 

‘limit” 
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Now  that  we  have  our  function  defined  we  still  have  to  use  it. 

We  call  these  user-defined  functions  exactly  the  same  way  as  we  use  system-defined 
ones. 

Note  that  the  names  used  outside  the  function  definition  have  nothing  to  do  with  the 
names  used  within  the  definition.  It's  values  that  are  passed  in  and  out,  not  names. 


Why  use  — 

— ►  Reuse 

functions? 

If  you  use  a  function 

1 

\ 

in  lots  of  places  and 
have  to  change  it, 
you  only  have  to  edit 
\  it  in  one  place. 

Clarity 

Reliability 

Clearly  separated 

Isolation  of  variables 

components  are 

leads  to  fewer 

easier  to  read. 

accidental  clashes 
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So  we  can  define  our  own  functions.  So  what? 

There  are  lots  of  reasons  to  use  your  own  functions  in  your  code. 

The  first  reason  is  clarity.  If  you  extract  the  nitty-gritty  of  how  do  do  various  operations 
into  functions  then  you  can  string  the  function  calls  together  in  the  body  of  your  script 
or  program  and  the  whole  becomes  much  easier  to  read.  You  can  see  the  wood, 
because  the  trees  are  all  packaged  up  inside  functions. 

It  also  lets  you  write  more  reliable  code.  Because  your  functionality  is  chopped  up  into 
function-sized  chunks  you  can  check  those  pieces  individually.  If  you  have  one 
function  that  reads  data  from  a  file,  a  second  that  processes  the  data  and  a  third  that 
writes  the  processed  data  out  again  you  can  write  tests  for  those  three  pieces  of 
functionality  that  won't  trip  over  each  other. 

Finally,  hiving  off  functionality  to  functions  allows  those  lumps  of  functionality  to  be 
easily  copied  into  other  scripts.  (Actually,  we  don't  even  need  to  copy  them  as  we  will 
see  very  shortly.) 


A  “real”  worked  example 


Write  a  function  to  take  a  list  of  floating  point 
numbers  and  return  the  sum  of  the  squares. 


(a)  -  X|a|2 


ucs  304 


Let's  take  some  real  examples,  both  in  the  sense  that  you  might  really  want  a  function 
that  does  this,  and  in  terms  of  how  you  might  write  it. 

We'll  start  with  creating  the  sum  squares  of  a  list  of  floating  point  numbers. 


Example  1 

def  norm2( values) : 

sum  =  0.0 

for  value  in  values: 

sum  =  sum  +  value**2 

return  sum 
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This  isn't  the  best  implementation  in  the  world,  but  it  is  the  simplest. 

It  follows  a  very  common  pattern  for  “accumulating”  functions.  It  sets  up  an  initial  value 
(typically  zero  for  addition,  and  one  for  multiplication)  and  then  runs  through  its  input 
list,  accumulating  the  values  from  the  list.  Finally  it  returns  the  acculated  value  to  end 
the  function. 


Example  1 

print  norm2([3.0,  4.0, 

1 

50.0 

5.0]) 

$  python  norm2.py 

50.0 

[3.0,  4.0,  5.0] 

169.0 

[12.0,  5.0] 

ucs 

306 

This  isn't  the  best  implementation  in  the  world,  but  it  is  the  simplest. 

It  follows  a  very  common  pattern  for  “accumulating”  functions.  It  sets  up  an  initial  value 
(typically  zero  for  addition,  and  one  for  multiplication)  and  then  runs  through  its  input 
list,  accumulating  the  values  from  the  list.  Finally  it  returns  the  accumulated  value  to 
end  the  function. 

There  is  an  example  of  this  function  being  used  in  the  file  norm2 .  py.  This  finds  the 
norm  squared  of  two  lists  of  numbers,  once  with  an  explicit  list  and  once  with  a  named 
list. 


A  second  worked  example 


Write  a  function  to  pull  the 
minimum  value  from  a  list. 


(a)  -  min(a) 
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Here's  another  “real  world”  example.  Given  a  list  of  values,  return  the  minimum  value 
from  the  list. 


Example  2 

def  minimum(a_list ) : 

a_min  =  a_list[0] 
for  a  in  a_list: 
if  a  <  a_min : 
a_min  =  a 

return  a_min 

When  will  this  go  wrong? 
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This  is  an  example  of  a  function  that  won't  always  work.  There  is  one  circumstance 
when  it  will  fail.  What  is  it? 

There  is  an  example  of  this  script  in  minimum .  py.  This  tries  to  find  the  minimum  of 
two  lists:  once  with  an  explicit  list,  and  once  with  a  named  list.  There  is  a  third  attempt, 
commented  out,  which  demonstrates  how  the  function  can  fail. 


Example  2 


print  minimum( [2 . 0,  4.0,  1.0,  3.0]) 

I 

1.0 


$  python  minimum. py 

3.0 


[4.0,  3.0 ,  5.0] 
[12,  5] 
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A  third  worked  example 

Write  a  function  to  “dot  product”  two  vectors. 

(a,b.)  -  lakbk 

ucs 


This  is  the  generalization  of  the  norm2( )  function. 


Example  3 

def  dot(a_vec,  b_vec): 
sum  =  0.0 

for  n  in  range(0, len(a_vec) ) : 

sum  =  sum  +  a_vec[n] *b_vec[n] 

return  sum 


ucs 


When  will  this  go  wrong? 
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Again,  this  simple  Python  implementation  fails  under  certain  circumstances.  The  index 
runs  over  the  length  of  the  first  list.  What  happens  if  the  second  list  is  longer?  Or 
shorter? 

There  is  an  example  of  this  script  in  dot_product.py.  This  calculates  two  dot  products, 
once  with  literal  values,  and  once  with  names.  It  also  has  two  examples  commented 
out  that  will  go  wrong  in  different  ways. 


Example  3 

print  dot([3.0,  4.0],  [1.0,  2.0])) 

1 

11.0 

$  python  dot_product . py 

11.0 

115 
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Example  3  —  version  2 

def  dot(a_vec,  b_vec): 

if  len(a_vec)  !=  len(b_vec): 

print  'WARNING:  lengths  differ!' 

sum  =  0.0 

for  n  in  range(0, len(a_vec) ) : 

sum  =  sum  +  a_vec[n] *b_vec[n] 

return  sum 
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If  there  are  circumstances  under  which  your  function  will  fail  or  will  give  misleading 
results,  it  is  always  a  good  idea  to  test  your  inputs. 

Remember:  functions  get  reused.  The  next  user  might  not  be  as  careful  as  you,  or 
might  not  even  know  the  limitation. 

Better  ways  to  handle  error  cases  are  presented  in  the  “Python:  Further  Use”  course. 


A  fourth  worked  example 

Write  a  function  to  filter  out  the 
positive  numbers  from  a  list. 

e.g. 

[1,  -2,  0,  5,  -5,  3,  3,  6] - -[1,  5,  3,  3,  6] 
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This  is  our  fourth  and  final  example.  Rather  than  a  simple  numerical  result,  this  one 
returns  a  list. 


Example  4 

def  positive(a_list ) : 

answer  =  [] 

for  a  in  a_list: 
if  a  >  0: 

answer . append(a) 

return  answer 
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Within  the  function  body  we  use  one  of  our  classic  means  to  build  a  list.  We  start  with 
an  empty  list  and  append  ( )  elements  to  it  one  at  a  time. 

There  is  an  example  script  in  positive .  py.  Note  that  it  is  quite  permissible  for  an 
empty  list  to  be  returned  if  there  are  no  positive  values  in  the  input. 


Progress 

Functions  ! 

Defining  them 

Using  them 
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Exercise 

Write  a  function  list_max( )  which  takes  two  lists 
of  the  same  length  and  returns  a  third  list  which 
contains,  item  by  item  the  larger  item  from  each  list. 

list_max( [1,5, 7],  [2,3,6] ) - -[2,5,7] 

Hint:  There  is  a  built-in  function  max(x,y) 
which  gives  the  maximum  of  two  values. 
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If  you  want  some  hints  for  how  to  solve  this  exercise,  look  at  the  third  and  fourth 
worked  examples  again. 

Hints: 

•  The  third  example  demonstrates  how  you  use  the  index  to  move  through  two  lists 
in  parallel. 

•  The  fourth  example  demonstrates  starting  with  an  empty  “answer”  list  and 
growing  it  an  item  at  a  time  for  each  round  of  a  for . . .  loop. 

•  Python  has  a  function  which  returns  the  maximum  of  two  simple  values. 

»>  max(l,2) 

2 

>»  max (4 . 0,  -  5 . 0) 

4.0 

You  cannot  use  it  directly  on  a  list. 


How  to  return  more 

than  one  value? 

Write  a  function  to  pull  the 

minimum  and  maximum 

values  from  a  list. 
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To  date  our  functions  have  all  returned  a  single  value,  even  where  that  value  was  a 
list.  For  example  we  might  have  a  function  that  returns  the  minimum  value  from  a  list 
and  a  second  function  that  returns  the  maximum.  Why  can't  we  have  a  function  that 
returns  both  at  the  same  time? 

A  list  of  two  elements  is  not  an  appropriate  type  to  return.  The  pair  of  values  is  just 
that:  a  pair  of  values.  There's  no  reason  why  they  should  come  in  a  particular  order. 
There's  no  concept  of  the  third  item  in  the  list. 


Returning  two  values 

def  min_max(a_list ) : 
a_min  =  a_list[0] 
a_max  =  a_list[0] 
for  a  in  a_list: 
if  a  <  a_min : 

a_min  =  a 
if  a  >  a_max: 
a_max  =  a 

return  (a_min,  a_max) 
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Pair  of 
values 
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We  should  have  no  problem  thinking  about  the  body  of  the  function  by  now.  But  what 
do  we  do  with  the  return  statement  to  return  two  values  at  the  same  time? 

We  do  it  by  returning  a  pair  of  values.  Python  indicates  these  by  separating  them  with 
a  comma.  This  pair  is  typically  surrounded  by  brackets  for  clarity,  but  actually  it's  the 
comma  that's  the  active  ingredient. 

There  is  an  example  of  this  in  the  script  minmax .  py. 


Receiving  two  values 


values  =  [1,  2,  3, 
(minval,  maxval)  = 

print  minval 
print  maxval 


4,  5,  6,  7,  8,  9] 

min_max( values) 

Pair  of 
variables 


ucs 


So  we  can  emit  a  pair  of  values  from  the  innards  of  the  function.  How  do  we  pick  up 
those  values  on  the  outside  when  we  use  the  function? 

We  use  exactly  the  same  commas  and  brackets  notation  as  we  did  before. 

There  is  an  example  of  this  in  the  script  min_max .  py. 


Pairs,  triplets,  ... 

singles 

doubles 

triples 

quadruples 

quintuples 


“tuples” 
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There's  a  posh  name  for  these  comma  separated  collections  of  values:  “tuples”. 

The  word  comes  from  the  name  given  once  we  get  past  “triples”  for  three  items 
together:  “quadruples”,  “quintuples”,  “hextuples”,  etc. 

We  meet  them  often  enough  for  tern  to  deserve  a  few  slides  of  consideration. 


Tuples  *  Lists 

Lists 

Tuples 

Concept  of  “next  entry” 

All  items  at  once 

Same  types 

Different  types 

Mutable 

Immutable 

ucs 
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Tuples  are  not  quite  the  same  as  lists.  In  fact,  they  differ  very  significantly  from  lists  in 
a  few  technical  ways,  but  the  most  important  difference  is  conceptual. 

A  list  is  used  for  a  sequence  of  numbers  where  there  is  some  concept  of  successor; 
each  item  naturally  follows  the  one  before  it.  A  natural  question  to  ask  of  a  list  is  “is 
there  a  meaningful  way  to  extend  the  list?” 

A  tuple  is  used  when  all  the  items  happen  “at  once”. 

In  a  list,  where  there  is  a  concept  of  a  sequence,  the  items  tend  to  be  all  of  the  same 
type.  In  fact  we  recommend  that  you  only  use  lists  with  all  items  of  the  same  type. 

In  a  list  where  there  are  just  a  number  of  items  grouped  together,  there  is  no  such 
obligation. 

Finally,  there  is  an  important  technical  difference.  We  saw  with  lists  that  we  could 
change  individual  elements  and  had  an  entire  section  contrasting  modifying  a  list  with 
replacing  a  list.  Tuples  are  immutable. 


Tuple  examples 

Pair  of  measurements  of  a  tree 

(height, width) 

(7.2,  0.5) 

(width, height) 

(0.5,  7.2) 

Details  about  a  person 

(name,  age,  height) 

('Bob',  45, 

1.91) 

(age,  height,  name) 

(45,  1.91, 

'Bob') 
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Here  are  some  examples  of  natural  use  of  tuples. 

Suppose  we  are  measuring  trees.  We  measure  their  width  and  heights.  These  two 
number  are  related  (same  tree)  so  we  pair  them  up  as  we  sling  them  round  the 
program,  but  they  could  come  in  either  order.  There  is  no  natural  order  for  these  two 
numbers. 

We  could  handle  three  pieces  of  data  about  people  in  our  program,  for  example  their 
names,  ages  and  heights.  These  are  three  different  types  of  data  and  can  come  in  any 
order. 


Progress 

Tuples 

“not  lists” 

Multiple  values  bound  together 

Functions  returning  multiple  values 

ucs 
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Exercise 

Copy  the  min_max()  function. 
Extend  it  to  return  a  triplet: 
(minimum,  mean,  maximum) 
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Copy  the  min_max  function  from  min_max.py. 

The  exercise  is  to  add  an  arithmetic  mean  to  the  values  returned  (turning  a  pair  into  a 
triple).  To  calculate  a  mean: 

1.  Set  up  a  sum  variable  before  the  for...  loop  with  initial  value  0-0. 

2.  Within  the  loop  add  each  value  encountered  to  sum. 

3.  After  the  for...  loop  is  complete  (so  not  indented)  calculate  mean  as  sum  divided  by 
the  length  of  the  list  of  values. 

4.  Return  the  three  values  rather  than  just  two. 


Tuples  and 
string  substitution 

“Hello,  my  name  is  Bob  and  I'm  46  years  old.” 


ucs 


Well  take  a  quick  break  from  functions  fro  a  moment.  Now  that  we  have  tuples  we 
ought  to  look  at  what  else  we  can  do  with  them. 

Suppose  we  want  to  take  some  values  (e.g.  from  a  tuple)  and  substitute  them  into 
string,  mail-merge-style. 


Simple  string  substitution 


Substitution  marker 


Substitution  operator 

»>  '  My  name  is  %s  .  '  %  '  Bob ' 

' My  name  is  Bob .  ' 

%s  Substitute  a  string. 
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Let's  start  with  a  single  substitution. 

We  take  a  string  containing  the  magic  code  “%s”  marking  where  we  want  the  string  to 
be  inserted.  It  will  substitute  for  the  %s. 

We  follow  the  string  with  the  substitution  operator,  “%”.  This  has  nothing  to  do  with  the 
arithmetic  use  of  the  same  character. 

We  follow  the  substitution  operator  with  the  string  to  be  inserted.  The  “%s”  means  that 
the  substitution  is  expecting  a  string  and  a  string  must  be  provided. 

The  result  is  the  original  string  with  “%s”  replaced. 


Simple  integer  substitution 

Substitution  marker 

»>  'I  am  %d  years  old  .  '  %  46 

'I  am  46  years  old . ' 

%d  Substitute  an  integer. 
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We  can  do  exactly  the  same  thing  with  a  “%d”  to  indicate  an  integer. 


Tuple  substitution 


Two  markers 
A  pair 


' ' 'My  name  is  %s  and 

I  am  %d  years  old. ' ' ' 

0/ 

A o 

('Bob',  46) 

'My  name  is  Bob  and\nl  am  46  years  old.' 
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And  this  is  where  tuples  come  in.  Suppose  we  want  to  substitute  a  string  and  an 
integer.  If  we  follow  the  substitution  operator  with  a  tuple  then  the  markers  in  the  string 
get  replaced  in  order  from  the  tuple. 


Lists  of  tuples 


data  = 

=  [ 

( 

'Bob', 

46), 

( 

1  Joe' , 

9), 

( 

'Methuselah',  969) 

for  (person,  age)  in  data: 
print  '%s  %d '  %  (person 


List  of  tuples 

Tuple  of 
variable 
names 

age) 
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In  practice  we  might  see  something  like  this. 

Our  data  comes  as  a  list  of  tuples  (or  something  treated  like  a  list).  We  can  use  a  tuple 
of  variable  names  to  identify  these  values  in  a  for...  loop. 


Problem:  ugly  output 

Bob  46 

Joe  9 

^  Columns  should  align 

Methuselah  969 

Bob  46 

Joe  9 

Methuselah  969 

ucs 

Columns  of  numbers 
should  be  right  aligned 
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Trouble  is,  t  produces  really  ugly  output.  Typically  with  lists  of  data  like  that  we  want 
them  aligned  in  columns.  Numbers,  typically,  get  right  aligned  for  easy  comparison 
too. 


Solution:  formatting 

1  %s '  5 

£  'Bob' — ►  'Bob' 

Five  characters 

'%5s '  5 

'°A- 5s'  5 

£  'Bob' — ►  'uuBob' 

Right  aligned 

£  'Bob' — ►  'Bobuu' 

Left  aligned 

' %5s '  5 

£  'Charles' - ►'Charles' 
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We  have  a  solution.  The  substitution  operators  have  a  set  of  modifiers  that  let  us 
change  the  details  of  the  substitution.  For  example, 

The  simplest  are  for  the  strings.  Adding  a  number  between  the  “%”  and  the  “s” 
specifies  how  many  characters  should  be  assigned  to  the  string.  The  string  is  right 
aligned.  If  we  specify  a  negative  number  it  is  left  aligned.  (These  defaults  make  more 
sense  for  numbers).  If  the  string  being  inserted  is  too  long  then  it  just  overflows;  it 
does  not  truncate. 


Solution:  formatting 


'%d  5 

y0  46  — 

* 

CD 

' %5d '  ! 

y0  46  — 

— 

'  46 

LJ  1_|  l_l 

'%-5d '  ! 

y0  46  — 

— 

' 46uuu 

'%05d  '  ! 

y0  46  — 

— 

'00046 

ucs 

There  is  similar  formatting  for  integers.  There  is  an  additional  option  for  integers  where 
a  “0”  is  inserted  before  the  width  specifier.  This  pads  the  number  out  with  leading 
zeroes. 


Columnar  output 

data  =  [ 

('Bob',  46), 

('Joe',  9), 
('Methuselah',  969) 

] 


for  (person,  age)  in  data: 

print  '%-10s  %3d '  %  (person,  age) 

Properly  formatted 
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We  now  have  everything  we  need  for  formatted  output. 


Floats 

'%f '  % 

3.141592653589  - 

— ►  '3.141593' 

1 %. 4f 1  % 

3.141592653589  - 

— ►  '3.1416' 

4f '  % 

3.1  - 

— ►  '3.1000' 

ucs 

335 

Finally,  we  need  to  look  at  floating  point  numbers.  These  have  many  more  options  and 
we  will  restrict  ourselves  to  just  the  most  useful  here.  Note  that  truncating  a  floating 
point  number  causes  it  to  be  rounded. 


Progress 


Formatting 

Formatting 

Formatting 


operator 

' %s  %d '  % 

(  'Bob 

markers 

%s  %d 

%f 

modifiers 

%-4s 

46) 
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We  have  taken  a  quick  tour  of  formatting  and  string  substitution  using  tuples  (or  a 
single  for  the  simplest  cases).  There  is  a  fuller  set  of  formatting  codes  as  a  separate 
hand  out. 


Exercise 

Complete  the  script  f  ormatl .  py 
to  generate  this  output: 


Alfred 

46 

1.90 

Bess 

24 

1.75 

Craig 

9 

1.50 

Diana 

100 

1.66 

T 

T 

T 

1 

9 

15 
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Edit  the  script  formatl.py  to  complete  this  exercise.  I  suggest  you  attack  the  problem 
in  stages. 

1.  Get  the  basic  %X  symbols  right. 

2.  Then  work  on  the  name  column  and  get  it  right 

3.  Then  work  on  the  age  column 

4.  Add  some  spaces  to  get  the  height  column  right. 

5.  Format  the  height  column  for  two  decimal  places. 


Reusing  our  functions 

Want  to  use  the  same  function  in  many  scripts 

Copy? 

Have  to  copy  any  changes. 

Single 

Have  to  import  the 

instance? 

set  of  functions. 

ucs 
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Now  let's  get  back  to  the  idea  of  reusing  a  function.  We  can  reuse  a  function  within  a 
script  easily.  Its  definition  is  written  once  near  the  top  of  the  script  and  we  use  it 
multiple  times  within  the  script.  If  we  change  (e.g.  fix)  the  function  definition,  all  the 
points  in  the  script  that  use  the  function  immediately  benefit. 

Now  suppose  we  had  written  a  really  useful  function  that  we  wanted  to  use  in  lots  of 
different  scripts. 

We  can,  of  course,  just  copy  the  function's  definition  from  one  script  to  another. 
However,  if  we  change  (fix)  the  function  definition  in  one  script  we  have  to  repeat  the 
edit  in  all  our  scripts. 

What  we  want  is  a  mechanism  to  use  a  single  definition  in  multiple  scripts.  This  is 
called  “importing  the  function”. 


How  to  reuse  —  0 


def  min_max(a_list ) : 

return  (a_min, a_max) 

vals  =  [1,  2,  3,  4,  5] 

(x,  y)  =  min_max(vals) 
print (x,  y) 

_ five . py _ ^ 
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Let's  do  this  as  a  worked  example. 

In  an  earlier  exercise  we  wrote  a  function  that  generates  (simultaneously)  the 
minimum  and  maximum  of  a  list  and  returns  it  as  a  pair.  We  have  an  example  of  this 
a  script  called  five .  py.  We  are  going  to  split  the  definition  of  this  function  out  from 
the  script  that  uses  it. 

$  python  five.py 

(1,  5) 


How  to  reuse 


vals  =  [1,  2,  3,  4,  5] 
(x,  y)  =  min_max(vals) 
print (x,  y) 

five . py 
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def  min_max(a_list ) : 

return  (a_min, a_max) 

_ ^ 


utils . py 

Move  the  definition 
of  the  function  to  a 
£7  separate  file. 
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The  first  thing  we  do  is  to  cut  and  paste  the  definition  into  a  different,  new  file.  We  will 
call  it  utils  .  py  (short  for  “utilities”). 

The  script  five .  py  will  no  longer  work.  It  cannot  find  the  definition  of  the  min_max( ) 
function  it  uses. 

python  five.py 

Traceback  (most  recent  call  last): 

File  "five.py",  line  3,  in  <module> 

(x,  y)  =  min_max(vals) 

NameError:  name  'min  max'  is  not  defined 


How  to  reuse  - 

2 

import  utils 

def  min_max(a_list ) : 

return  (a_min, a_max) 

_ 

vals  =  [1,  2,  3,  4,  5] 

utils . py 

(x,  y)  =  min_max(vals) 

Identify  the  file  with 

print (x,  y) 

the  functions  in  it. 

five .  dv _ Is 
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So  now  we  modify  five .  py  to  import  the  min_max( )  function  from  utils  .  py. 

First,  we  tell  the  script  where  to  get  some  more  functions  from.  We  do  this  with  the 

command 

import  utils 

This  causes  Python  to  go  looking  for  a  file  called  utils  .  py  which  contains  functions. 
Don't  worry  about  where  it  goes  looking;  it  includes  a  set  of  system  locations  and  your 
current  directory. 

On  its  own  this  isn't  sufficient. 

$  python  five.py 

Traceback  (most  recent  call  last): 

File  "five.py",  line  4,  in  <module> 

(x,  y)  =  min_max(vals) 

NameError:  name  'min_max'  is  not  defined 

We  need  to  tell  Python  that  min_max( )  is  supposed  to  come  from  that  import.  (There 
may  be  several  imports  or  it  may  be  meant  to  come  from  the  current  script,  or  even 
the  system.) 


How  to  reuse 


import  utils 


vals  =  [1,  2,  3,  4,  5] 

(x,  y)  =  utils . min_max(vals) 
print (x,  y) 


def  min_max(a_list ) : 

return  (a_min, a_max) 

_ ^ 


utils . py 


Indicate  that  the 
function  comes 
five,  py  17  from  ^at  innport. 


ucs 
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We  indicate  that  min_max( )  now  comes  from  the  utils  .  py  file  by  prefixing 
“utils  .  ”  to  its  name. 

Now  it  works  again: 

$  python  five.py 

(1,  5) 


A  library  of  our  functions 


“Module” 


Container- 


Functions 

Objects 

Parameters 


ucs 


343 


This  collection  of  functions  in  a  .  py  file  is  called  a  “module”.  Actually,  a  module  can 
contain  more  than  just  functions  but  that's  what  we  are  going  to  be  most  interested  in 
for  this  course. 

A  module  is  a  collection  of  functions,  types  of  object  and  various  parameters,  all 
bound  together,  brought  in  my  a  single  import  statement  and  all  with  the  same  dotted 
prefix  to  identify  where  they  came  from. 


System 

modules 

os 

operating  system  access 

subprocess 

support  for  child  processes 

sys 

general  system  functions 

math 

standard  mathematical  functions 

numpy 

numerical  arrays  and  more 

scipy 

maths,  science,  engineering 

CSV 

read/write  comma  separated  values 

re 

regular  expressions 

ucs 
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A  huge  number  of  modules  exist  built  in  to  Python,  or  typically  provided  alongside  it. 
Here  are  just  a  few  of  the  more  useful  ones. 

Python  keeps  its  language  simple  by  hiving  off  most  of  the  complexities  of  special 
circumstances  into  modules  that  you  only  import  if  you  need  that  particular  piece  of 
functionality. 

“There's  a  module  for  that”  is  the  standard  answer  to  almost  all  “how  do  I ...”  questions 
in  Python. 


Using  a  system  module 

»>  import  math 

»>  math . sqrt (2 . 0) 

1.4142135623730951 

»> 
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Keep  track  of 
the  module  with 
the  function. 


Let's  take  an  example.  We  will  work  interactively  here  just  for  convenience. 

Python  itself  does  not  support  most  mathematical  functions.  These  are  in  the  “math” 
module  (beware  American  spelling).  So  if  we  want  the  square  root  of  a  real  number  we 
need  the  math  .  sqrt  ( )  function,  i.e.  the  sqrt  ( )  function  from  the  math  module. 


Don't  do  this 

»>  from  math  import  sqrt 

»>  sqrt(2.0) 

1.4142135623730951 

»> 
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There  are  a  couple  of  short  cuts  that  we  want  to  advise  you  away  from.  These  are 
syntactically  legal  but  lead  to  confusion  and  the  author  of  Python,  Guido  van  Rossum, 
allegedly  regrets  ever  having  permitted  them. 

You  can  import  a  single  function  from  a  module  and  then  use  it  without  identifying 
which  module  it  comes  from. 

Don't  do  that. 


Really  don't  do  this 


>»  from  math  import 

»>  sqrt(2.0) 

1.4142135623730951 

»> 
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You  can  even  import  all  the  functions  from  a  module  and  use  them  without  identifying 
the  module  they  come  from. 

Really  don't  do  that. 


Do  do  this 

»>  import  math 

»>  help(math) 

Help  on  module  math: 

NAME 

math 

DESCRIPTION 

This  module  is  always  available.  It 
provides  access  to  the  mathematical 
functions  defined  by  the  C  standard. 
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So  how  do  you  find  your  way  around  a  new  module? 

One  of  the  things  that  should  be  built  in  to  a  module  is  its  own  documentation.  You 
may  request  help  on  any  imported  module  by  issuing  the  Python  command  help( ) 
on  the  module  name.  The  module  must  be  imported  before  you  ask  for  help  on  it. 


Progress 

“Modules” 

System  modules 

Personal  modules 

import  module 

module . function( .  .  . ) 

ucs 
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Exercise 

1.  Edit  your  utils .  py  file. 

2.  Write  a  function  print_list  ( )  that  prints 
all  the  elements  of  a  list,  one  per  line. 

3.  Edit  the  elements2 .  py  script  to  use  this  new 
function. 
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Interacting  with  the  system 

»>  import  sys 

UCS  35 


So  now  we  can  start  looking  at  the  modules  that  come  with  every  Python 
implementation. 

The  sys  module  provides  the  hooks  for  interacting  with  the  system  in  an  operating 
system  neutral  fashion.  (There  is  a  separate  module  for  the  operations  that  do  depend 
on  the  operating  system.) 


Standard  input  and  output 

»>  import  sys 

sys . stdin 

Treat  like  an 

open(...,  'r')  file 

sys . stdout 

Treat  like  an 

open(...,  W)  file 

ucs 
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So,  what's  in  sys? 

First,  we  will  look  at  two  objects  (rather  than  functions)  that  are  very  useful  if  you  write 
in  the  classic  “filter”  style: 

python  script. py  <  input_file  >  output_file 
The  object  sys  .  stdin  corresponds  to  the  “standard  input”  (input_f ile  in  our 
example)  as  an  already  opened  file  object.  The  sys  .  stdout  object  is  the  equivalent 
for  the  “standard  output”  (output_f ile  in  our  example). 


Line-by-line  copying  —  1 


import  sys 

for  line  in  sys.stdin: 

sys . stdout .write (line) 


Import  module 


No  need  to  open( )  sys  .  stdin  or  sys  .  stdout. 
The  module  has  done  it  for  you  at  import. 


ucs 


For  example,  here  is  a  complete  Python  script  for  copying  one  file  to  another  line  by 
line. 


Line-by-line  copying  - 

2 

import  sys 

for  line  in  sys. stdin: 

Standard  input 

sys . stdout .write (line) 

ill 

Treat  a  file  like  a  list - ►  Acts  like  a  list  of  lines 

ucs 
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Note  the  usual  trick  with  an  open  file  object:  if  we  treat  it  like  a  list  it  behaves  like  a  list 
of  lines.  The  sys  .  stdin  object  is  just  an  open  file.  The  only  difference  is  that  it  was 
opened  for  us. 


Line-by-line  copying  —  3 

import  sys 

for  line  in  sys.stdin: 

sys . stdout .write (line) 

Standard  output 

An  open  file 

The  file's 
write( ) 

method 

ucs 
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Similarly,  we  treat  sys  .  stdout  as  an  open  file  (opened  for  writing).  We  don't  need  to 
open  it;  the  system  has  done  that  for  us. 


Line-by-line  copying  —  4 


import  sys 

for  line  in  sys.stdin: 

sys . stdout .write (line) 


Lines  in... 
lines  out 


$  python  copy.py  <  in.txt  >  out.txt 

Copy 
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We  can  now  copy  a  file.  Great. 


Now  copying  a  file  is  pretty  much  pointless.  We  have  cp  for  that. 

However,  the  general  shape  of  the  script  opens  up  the  route  to  two  different,  very 
commonly  needed  operations. 

The  first  is  where  we  change  the  lines,  or  process  them  in  some  way. 

The  second  is  where  we  only  write  them  out  again  if  some  criterion  is  satisfied. 

An  extreme  third  case  is  where  we  gather  statistics  as  we  go  and  print  them  out  only 
at  the  end. 


Line-by-line  rewriting 

import  sys 

Define  or 
import  a 
function  here 

for  input  in  sys.stdin: 

output  =  function( input 
sys . stdout .write (output 

.) 

) 

$  python  process. py  <  in.tx 

t  >  out . txt 

Process 
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The  standard  script  for  modifying  a  line  would  be  this.  Notice  that  we  tend  to  separate 
the  line-by-line  rewrite  and  the  process  of  running  through  the  lines  by  splitting  the 
rewrite  off  to  a  function. 


Line-by-line  filtering 

import  sys 

for  input  in  sys.stdin: 

if  test( input ) : 

sys . stdout . write(input ) 
_ 

$  python  filter. py  <  in.txt  >  out.txt 

Filter 

UCS  3 


Define  or 
import  a  test 
function  here 


Similarly,  this  is  the  model  for  optionally  writing  out  the  line  or  not. 


Progress 

sys  module 

sys . stdin 

Standard  input 

sys . stdout 

Standard  output 

“Filter”  scripts 

process  line-by-line 

only  output  on  certain  input  lines 
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Exercise 

Write  a  script  that  reads  from  standard  input. 

If  should  generate  two  lines  of  output: 

Number  of  lines: 

MMM 

Number  of  blank  lines: 

NNN 

Hint:  len (line .  split  () ) 

==  0  for  blank  lines. 

ucs 
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Blank  lines  may  have  spaces  on  them.  The  best  test  for  blank  lines  is  to  take  the  line 
and  to  split  it  into  “words”.  If  there  are  none,  count  the  line  as  blank. 


The  command  line 

We  are  putting 
parameters  in 
our  scripts. 

number  =  1.25 

_ 

We  want  to  put 
them  on  the 
command  line. 

$  python  script. py  1.25 

ucs 
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Now  let's  look  at  another  facility  that  the  sys  module  gives  us. 

To  date  we  are  setting  our  parameters  explicitly  in  the  script  itself.  We  really  want  to 
enter  them  on  the  command  line. 


Reading  the  command  line 


import  sys 
print ( sys . argv) 

_ P 
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$  python  args.py  1.25 

['args.py',  '1.25'] 


sys.argv[0] 

sys.argv[l] 

Script's 

First 

name 

argument 

A  string! 

The  sys  module  provides  an  object  sys  .  argv  which  is  a  list  of  all  the  command  line 
arguments.  We  can  see  this  with  a  trivial  script  that  prints  it  out. 

We  should  notice  a  couple  of  significant  points: 

The  name  of  the  script  itself  is  item  zero  in  the  sys .  argv  list. 

All  the  items  on  the  command  line  are  presented  as  strings. 


Command  line  strings 


import  sys 

number  =  sys.argv[l] 
number  =  number  +  1.0 

print(number) 

_ _ _ pr 

Traceback  (most  recent  call  last): 

File  "thing. py",  line  3,  in  <module> 
number  =  number  +  1.0 
TypeError : 

cannot  concatenate  'str'  and  'float'  objects 
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Because  all  command  line  arguments  are  presented  as  strings  we  can't  treat 
numerical  arguments  as  numbers  straight  away. 


Using  the  command  line 


Enough  arguments? 
Valid  as  floats? 


ucs 


import  sys 

number  =  float(sys .argv[l] ) 
number  =  number  +  1.0 

print(number) 

_ _ _ _ _ pr 


We  have  to  convert  them  to  the  correct  type.  We  have  already  met  the  type 
conversion  functions.  If  we  want  a  floating  point  number  on  the  command  line  we  use 
the  float  ( )  function  to  convert  from  the  given  string  to  the  desired  float. 


Better  tools  for 
the  command  line 

argparse  module  Very  powerful  parsing 

Experienced  scripters 
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Manual  parsing  of  the  command  line  will  do  for  simple  scripts.  There  is  a  module 
dedicated  to  parsing  the  command  line  called  “argparse”.  This  is  more  suitable  for 
slightly  more  experienced  scripters,  but  is  exceptionally  powerful. 


General  principles 

1.  Read  in  the  command  line 

2.  Convert  to  values  of  the  right  types 

3.  Feed  those  values  into  calculating  functions 

4.  Output  the  calculated  results 
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Here's  a  general  approach  for  scripts  that  process  the  command  line.  The  important 
bit  is  that  the  parsing  of  the  command  line  from  string  to  directly  usable  values  should 
be  split  off  into  a  function,  which  can  then  be  independently  tested. 


Worked  example 

Write  a  script  to  print  points 
(x,  y)  y=xr  xe[0,l],  uniformly  spaced 

Two  command  line  arguments: 
r  (float)  power 
N  (integer)  number  of  points 
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So,  let's  go.  We  will  write  a  “proper”  program  that  reads  input  from  the  command  line 
and  uses  it  to  control  its  output. 

We  will  consider  as  our  goal  a  script  that  takes  a  floating  point  number,  r,  and  an 
integer,  N,  from  the  command  line  and  supplies  N  points  (x,y)  uniformly  distributed 
along  the  curve  y=xr  on  standard  output  for  x  ranging  from  0-0  to  1-0  inclusive. 


General  approach 

la.  Write  a  function  that  parses  the  command  line  for 
a  float  and  an  integer. 

lb.  Write  a  script  that  tests  that  function. 

2a.  Write  a  function  that  takes  (r,  N)  as  (float,  integer) 
and  does  the  work. 

2b.  Write  a  script  that  tests  that  function. 

3.  Combine  the  two  functions. 
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Unsurprisingly,  we  are  going  to  split  it  up  into  functions.  Splitting  up  a  problem  into 
components  and  implementing  each  component  as  a  function  is  the  key  to  successful 
programming.  There  are  two  components  to  our  problem.  We  need  to  get  the 
command  line  arguments  into  forms  we  can  use:  one  float  and  one  integer.  We  also 
need  to  take  these  two  values  and  output  the  corresponding  points. 


la.  Write  a  function  that  parses  the  command  line  for 
a  float  and  an  integer. 


ucs  curve. py 


The  first  function  has  to  parse  the  command  line.  We  are  expecting  two  arguments  so 
we  simply  convert  them  and  return  a  pair  (tuple)  of  the  two  values.  If  there  are  not 
enough  command  line  arguments  or  if  they  cannot  be  interpreted  as  the  right  sort  of 
number  then  this  function  will  fail  and  the  script  will  halt. 


lb.  Write  a  script  that  tests  that  function. 


import  sys 

def  parse_args( ) : 

(r,  N)  =  parse_args() 
print  'Power:  %f '  %  r 
print  'Points:  %d '  %  N 
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curve . py 
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We  write  a  simple  test.  The  parsing  of  the  command  line  has  to  return  objects  of  the 
correct  type  and  value.  So  we  simply  print  out  their  values  from  within  a  substitution, 
which  will  fail  if  they  are  not  of  the  expected  types. 


lb.  Write  a  script  that  tests  that  function. 

$  python  curve. py  0.5  5 

Power:  0.500000 

Points:  5 
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It  works! 


2a.  Write  a  function  that  takes  (r,  N)  as  (float,  integer) 
and  does  the  work. 


ucs  curve. py 


The  second  function  takes  a  float  and  an  integer  (presumed  already  converted  from 
the  argument  strings)  and  outputs  the  data  we  want. 

Note: 

•  range (0,  num_points)  gives  num_points  as  desired,  but  its  maximum  value  is 
num_points-l.  Because  of  this  we  divide  by  num_points-l  in  the  following  line 
and  not  num_points. 

•  index  starts  as  an  integer,  as  does  num_points-l.  We  explicitly  convert  both  to 
floats  prior  to  dividing  one  by  the  other  to  make  sure  we  get  a  float  afterwards.  (If  we 
did  it  in  integers  every  value  except  the  last  would  be  0.) 

Our  function  does  not  need  to  return  any  value  because  it  is  just  printing  output  and 
doesn't  need  to  report  back. 


2b.  Write  a  script  that  tests  that  function. 


def  power_curve( pow,  num_points): 
power_curve(0 . 5,  5) 

_ PT 
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Next  we  need  to  test  our  function. 

We  run  it  with  values  passed  explicitly  in  the  script.  Its  specification  is  that  it  must 
produce  a  certain  number  of  points  satisfying  a  power  law.  So  we  have  two  checks  we 
need  to  make.  Does  it  produce  the  correct  number  of  points  and  are  they 
mathematically  correct? 


2b.  Write  a  script  that  tests  that  function. 


$  python  curve. py 

0.000000  0.000000 
0 . 250000  0 . 500000 
0.500000  0.707107 
0.750000  0.866025 
1.000000  1.000000 
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Yes,  they  are.  It  works. 


3.  Combine  the  two  functions. 


import  sys 

def  parse_args( ) : 

pow  =  float(sys .argv[l] ) 
num  =  int ( sys . argv[2] ) 
return  (pow,  num) 

def  power_curve( pow,  num_points): 
for  index  in  range(0,  num_points): 
x  =  float(index)/float(num_points-l) 
y  =  x**pow 

print  '%f  %f'  %  (x,  y) 

(power,  number)  =  parse_args() 
power_curve( power,  number) 


curve . py 


Now  we  trust  our  two  functions  we  combine  them  to  create  the  script's  final 
functionality: 

(1)  parse  the  command  line  to  get  the  power  and  number  of  points 

(2)  print  that  many  points  on  the  power  curve 


Progress 

Parsing  the  command  line 

sys . argv 

Convert  from  strings  to  useful  types 

int()  float () 
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Exercise 

Write  a  script  that  takes  a  command  line  of 
numbers  and  prints  their  minimum  and  maximum. 

Hint:  You  have  already  written  a  min_max  function. 
Reuse  it. 
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Back  to  our  own  module 

»>  import  utils 
»>  help(utils) 

Help  on  module  utils: 

NAME 

utils 

FILE  We  want  to  do 

/home/r  j  d4/utils  .  py  better  than  this. 
FUNCTIONS 

min_max( numbers ) 

■  ■  ■ 
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We  have  seen  that  the  system  modules  come  with  their  own  help.  What  does  ours 
come  with? 

We  can  ask  for  help  and  we  get  a  minimal,  automatically  generated  help  text.  We  want 
to  be  able  to  add  to  this. 


Function  help 

»>  import  utils 
»>  help ( utils  .min_max) 

Help  on  function  min_max  in 
module  utils: 

min_max( numbers) 
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We  can  also  ask  for  help  on  specific  functions  in  our  module  and  we  get  just  the  basic 
information  there. 

We  want  to  be  able  to  add  help  text  to  individual  functions  as  well  as  the  module  as  a 
whole. 


Annotating  a  function 


def  min_max(numbers) : 


minimum  =  numbers[0] 


Our  current  file 


maximum  =  numbers[0] 
for  number  in  numbers: 
if  number  <  minimum: 
minimum  =  number 
if  number  >  maximum: 
maximum  =  number 
return  (minimum,  maximum) 
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We  will  start  by  annotating  an  individual  function. 


A  “documentation  st 

:ring” 

def  min_max(numbers) : 

"""This  functions  takes  a  list 
of  numbers  and  returns  a  pair 
of  their  minimum  and  maximum. 

II  II  II 

A  string  before 
the  body  of  the 
function. 

minimum  =  numbers[0] 
maximum  =  numbers[0] 
for  number  in  numbers: 
if  number  <  minimum: 
minimum  =  number 
if  number  >  maximum: 
maximum  =  number 
return  (minimum,  maximum)  Ls 
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What  we  will  do  is  simply  place  a  string  immediately  after  the  def  line  and  before  any 
of  the  active  lines  in  the  function's  definition.  (Comments  don't  count.) 

Because  this  is  often  a  long  string  it  is  traditional  to  use  triple  quotes.  It  doesn't  matter; 
it's  just  a  string. 


Annotated  function 

»>  import  utils 
»>  help ( utils  .min_max) 

Help  on  function  min_max  in 
module  utils: 

min_max( numbers) 

This  functions  takes  a  list 
of  numbers  and  returns  a  pair 
of  their  minimum  and  maximum. 
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Now  if  we  ask  for  help  on  that  function  we  get  the  text  we  inserted. 


Annotating  a  module 


"""A  personal  utility  module 
full  of  all  the  pythonic  goodness 

I  have  ever  written. 

II  II  II 

def  min_max(numbers) : 

"""This  functions  takes  a  list 
of  numbers  and  returns  a  pair 
of  their  minimum  and  maximum. 


minimum  =  numbers[0] 
maximum  =  numbers[0] 
for  number  in  numbers 


A  string  before 
any  active  part 
of  the  module. 


F 
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How  do  we  annotate  the  module  as  a  whole? 

We  add  another  string  to  the  file,  this  time  before  any  of  the  active  lines. 


Annotated  module 

»>  import  utils 
»>  help(utils) 

Help  on  module  utils: 

NAME 

utils 

FILE 

/home/rjd4/utils . py 
DESCRIPTION 

A  personal  utility  module 

full  of  all  the  pythonic  goodness 

I  have  ever  written. 


And  we  get  the  text  out  again  when  we  ask  for  help  on  the  module. 


Progress 

Annotations 

...of  functions 

...of  modules 

“Doc  strings” 

help() 
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Exercise 


Annotate  your  utils  .  py  and  the  functions  in  it. 


0  3  minutes  387 


Simple  data  processing 


input  data 


What  format? 


Python  script 
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output  data 


We  now  have  just  about  enough  Python  to  write  some  serious  scripts.  We  need  one 
more  feature,  and  we  will  meet  it  by  looking  at  how  to  do  data  processing. 

First  of  all  we  ought  to  look  at  the  sorts  of  files  that  contain  our  data.  What  format  is 
the  data  in? 


Comma  Separated  Values 


A101, Joe, 45, 1 . 90, 100 
G042, Fred, 34, 1.80, 92 
H003, Bess, 56, 1 . 75, 80 
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A  very  common,  and  very  useful  format  is  called  “comma  separated  values”.  This  is 
usually  marked  by  a  suffix  “ .  csv”  on  the  file  name.  It  is  a  common  interchange  format 
for  spreadhseets. 

Each  record  is  a  row.  Each  column  is  separated  from  its  neighbours  by  a  comma. 
Sometimes  the  records  are  in  quotes. 


Quick  and  dirty  .csv  —  1 

CSV:  “comma  separated  values”  More  likely  to 

have  come 
from  sys.stdin 

»>  line  —  1-0/  2.0,  3.0,  4.0\n 


»>  line. split (',' ) 

Split  on  commas 

rather  than  spaces. 

['1.0',  '  2.0',  '  3.0', 

'  4 . 0\n ' ] 
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Note  the  leading 
and  trailing 
white  space. 

Here's  a  quick  way  to  chop  up  a  line  at  the  commas.  The  split  ( )  method  takes  an 
optional  argument  which  is  the  character  to  split  on.  Note  that  the  strings  in  the  list 
have  some  strange  spaces  in  them.  Don't  worry;  the  float  ( )  conversion  function  can 
handle  them. 


Quick  and  dirty  .csv  —  2 

»>  line  =  '1.0,  2.0,  3.0,  4.0\n' 

»>  strings  =  line. split ( ' ,  ' ) 

»>  numbers  =  [] 

»>  for  string  in  strings: 

numbers. append(float (string) ) 

■  ■  ■ 

»>  numbers 

[1.0,  2.0,  3.0,  4.0] 

UCS  39 


This  is  a  straightforward  conversion. 


Quick  and  dirty  .csv  —  3 

Why  “quick  and  dirty”? 

Can't  cope  with  common  cases: 

Quotes  '  "1.0",  "2.0",  "3.0",  "4.0"  ' 

Commas  '  A,  B\,  C,  D ' 

Dedicated  module:  csv 
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Don't  push  the  simple  split  ( )  trick  too  far,  though.  There  are  many  cases  it  can't 
cope  with.  If  you  want  to  handle  CSV  files  for  real  you  should  use  the  csv  module 
written  for  just  that  purpose. 


Proper  .csv 

Dedicated  module:  csv 

import  csv 
import  sys 

input  =  csv . reader ( sys . stdin ) 
output  =  csv .writer (sys . stdout) 

for  [id,  name,  age,  height,  weight]  in  input: 

output .writerow( [id,  name,  float(height)*100] ) 

Much  more  in  the  “Python:  Further  Topics”  course 

ucs  393 


The  csv  module  would  work  like  this.  Don't  worry  about  the  specifics;  there's  a  proper 
coverage  of  the  module  in  the  “further  topics”  Python  course. 


Processing  data 


Storing  data  in  the  program 


id 

name 

age 

height 

weight 

A101 

Joe 

45 

1.90 

100 

G042 

Fred 

34 

1.80 

92 

H003 

Bess 

56 

1.75 

80 

■  ■  ■ 

?  id  -  (name,  age,  height,  weight)  ? 
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So  how  we  can  read  tabular  or  columnar  data  how  can  we  store  it  within  the  program? 
Let's  consider  a  case  where  we  want  to  map  from  some  text  key  or  ID  to  a  tuple  of 
data. 


Simpler  case 

Storing  data  in  the  program 

id  name 

A101  Joe 

G042  Fred 

H003  Bess 

?  id  -  name  ? 
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Let's  start  with  a  simple  case  where  we  map  from  an  id  string  to  a  single  string  value 
as  opposed  to  a  tuple. 


Not  the  same  as  a  list... 

index 

name 

0 

Joe 

1 

Fred  names[l]  =  'Fred' 

2 

■  ■  ■ 

Bess 

['Joe', 

'Fred',  'Bess',  ...] 
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This  mapping  from  string  id  to  value  is  different  from  a  list.  A  list  is  indexed  by 
positions.  We  need  to  index  by  string. 


...but  similar:  a  “dictionary” 

id  name 
A101  Joe 

G042  Fred  names['G042']  =  'Fred' 

H003  Bess 

■  ■  ■ 

{'AlOl'i'Joe',  ,G042,:,Fred',  'HOOS'i'Bess',  ...} 
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We  are  going  to  use  a  different  structure.  We  want  something  that  takes  an  arbitrary 
Python  object  (rather  than  just  an  integer)  and  looks  up  a  corresponding  value.  Python 
has  such  a  type,  called  a  “dictionary”. 


Dictionaries 


“key” - 

-►“value” 

'G042'  — 

-►  'Fred' 

1700045  - 

-►  29347565 

'G042'  — 

-►  ('Fred',  34) 

(34,  56)  - 

— ►  'treasure' 

(5,6)  - 

-►  [5,  6,  10,  12] 
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Generalized  look  up 

Python 

Python 

object 

-►  object 

(immutable) 

(arbitrary) 

string - 

— ►  string 

int - 

— ►  int 

— ►  tuple 

tuple - 

tuple - 

— ►  list 
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A  dictionary  maps  from  an  arbitrary  Python  type  (strictly  speaking,  any  immutable 
Python  type)  to  an  arbitrary  (mutable  or  immutable)  type. 

The  jargon  is  that  instead  of  an  index  a  dictionary  has  a  “key”  which  it  maps  to  a 
“value”. 

We  can  map  from  strings  to  strings  (a  very  common  case),  or  from  strings  to  tuples 
(which  we  want  to  do  here). 


Building 

a  dictionary  —  1 

Curly  brackets 

Items 

Comma 

data  =  {  1 A101 1 

'Joe'  ,  ' G042 1 : 'Fred'  ,  'H003' : 'Bess'  } 

Key 

colon 

A101  -  Joe 

Value 

G042  -  Fred 

H003  -  Bess 
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So  how  do  we  build  a  dictionary? 

We  can  create  it  all  in  one  go  as  shown.  The  dictionary  is  delimited  with  curly  brackets 
(as  opposed  to  a  list's  square  brackets)  and  the  individual  elements  are  separated  by 
commas,  just  like  a  list.  The  elements  themselves,  however,  are  composite.  Each  is 
the  key/value  pair  separated  by  a  colon.  In  a  list  the  order  they  are  specified  defines 
the  index.  With  a  dictionary,  where  we  use  the  key  rather  than  an  index,  we  have  to 
quote  both  parts. 


Building  a  dictionary  —  2 

data  =  {} 

Empty  dictionary 

Square  brackets 

Key 

data  [  ' A101 '  ]  =  '  Joe ' 

Value 

data  [  ' G042 '  ]  =  'Fred' 

A101  - 

Joe 

data  [  ' H003  '  ]  =  ' Bess  ' 

G042  - 

Fred 

H003  - 

Bess 
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The  alternative  approach  is  to  create  an  empty  dictionary  with  just  the  pair  of  curly 
brackets  and  then  to  add  the  elements  one  at  a  time. 


Example  — 

1 

»>  data  =  { 'A101' 

: ' Joe ' ,  ' F042 ' : ' Fred ' } 

»>  data 

{'F042':  'Fred', 

'A101':  'Joe'} 

Order  is  not 
preserved! 
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So  here's  an  example  of  a  dictionary  being  created.  Note  that  because  there  is 
numerical  indexing  there  is  no  natural  ordering  either.  Just  because  I  enter  the 
key:value  combinations  in  one  order  doesn't  mean  it  stores  them  in  that  order. 


Example  —  2 

»>  data[ 1 A101 1  ] 

'Joe' 

»>  data[ '  A101'  ]  =  'James' 
»>  data 

{'F042':  'Fred',  'A101': 
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' James ' } 
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We  get  values  out  of  a  dictionary  by  quoting  the  corresponding  key.  They  key  appears 
in  square  brackets,  just  as  we  did  with  a  list.  We  can  change  the  value  corresponding 
to  a  key  just  like  we  did  with  a  list. 


Square  brackets  in  Python 

[...] 

Defining  literal  lists 

numbers [ N ] 

Indexing  into  a  list 

numbers [M:N] 

Slices 

values [key] 

Looking  up  in  a  dictionary 
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Note  that  while  we  use  curly  brackets  to  define  a  literal  dictionary,  we  still  use  square 
brackets  to  resolve  a  key  to  a  value  in  it. 

(The  only  reason  Python  uses  curly  brackets  rather  than  square  brackets  for  literal 
dictionaries  is  that  otherwise  the  Python  interpreter  would  not  be  able  to  distinguish 
“{}”  for  an  empty  dictionary  and  “  [  ] ”  for  an  empty  list.) 


Example  —  3 

»>  data[  'X123'  ]  =  'Bob' 

»>  data[ 1 X123 1  ] 

'Bob' 

»>  data 

{ 1 F042 1 :  'Fred',  'X123': 

' A101 ' :  ' James ' } 
ucs 
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We  can  add  additional  items  into  a  dictionary  using  the  same  syntax  as  we  used  to 
change  them.  Because  there  is  no  concept  of  order,  there  is  no  concept  of 
“appending”;  we  are  just  adding  additional  values. 


Dictionaries 


Progress 

data  = 

{ ' G042 ' : ( 1  Fred 1 , 34) ,  'A101' :('Joe',45)} 

data[ ' G042 ' ]  - ►  ('Fred', 34) 

data[ ' H003 ' ]  =  ( 'Bess  ',  56) 
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Exercise 

Write  a  script  that: 

1.  Creates  an  empty  dictionary,  “elements”. 

2.  Adds  an  entry  'H'  -  'Hydrogen'. 

3.  Adds  an  entry  'He'  -  'Helium'. 

4.  Adds  an  entry  'Li'  -  'Lithium'. 

5.  Prints  out  the  value  for  key  'He'. 

6.  Tries  to  print  out  the  value  for  key  'Be'. 
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Worked  example  —  1 

Reading  a  file  to  populate  a  dictionary 


elements.txt  File 


symbol_to_name  Dictionary 


H  Hydrogen 

He  Helium 

Li  Lithium 

Be  Beryllium 

B  Boron 

C  Carbon 

N  Nitrogen 

0  Oxygen 

F  Fluorine 

7 

ucs 


Let's  move  forward  from  that  example  to  look  at  populating  dictionaries  from  files. 

You  have  a  file  in  your  home  directories  called  elements  .  txt.  This  contains  92  rows 
of  data  in  2  columns:  the  symbol  and  name  for  each  chemical  element.  We  want  to 
create  a  dictionary  called  symbol_to_name  that  contains  equivalent  data. 


Worked  example  —  2 

data  =  open( '  elements  .  txt ' )  Open  file 

symbol_to_name  =  {}  Empty  dictionary 


Read  data 


for  line  in  data: 

[symbol,  name]  =  line.split() 

symbol_to_name [symbol]  =  name 

i 

Populate  dictionary 

data.close()  Close  file 

Now  ready  to  use  the  dictionary 

408 

Let's  see  how  we  would  do  it. 

We'll  start  with  an  open  file  to  read  the  data  from  and  an  empty  dictionary  to  write  the 
data  to. 

Then  we  run  through  the  data  in  the  file  a  line  at  a  time,  using  the  string  split()  method 
to  carve  the  line  up  into  its  two  components. 

For  each  line  we  take  those  two  components  as  the  key  and  value  and  ad  them  to  the 
dictionary. 

Finally,  after  running  through  the  file,  we  close  our  input  file. 


Worked  example  —  3 

Reading  a  file  to  populate  a  dictionary 


names . txt 


key_to_name 


A101  Joe 
F042  Fred 
X123  Bob 
K876  Alice 
J000  Maureen 
A012  Stephen 
X120  Peter 
K567  Anthony 
F041  Sally 

7 
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We  can  do  exactly  the  same  to  read  in  our  keys  -» names  table  too. 


Worked  example  —  4 

data  =  open( ' names . txt ' ) 
key_to_name  =  {} 

for  line  in  data: 

[key,  person]  =  line.split() 

key_to_name[key]  =  person 
data. close( ) 
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The  code  is  equivalent  with  just  some  names  changed. 


Make  it  a  function! 

symbol_to_name  =  {} 

data  =  open( ' elements . txt ' ) 

for  line  in  data: 

[symbol,  name]  =  line.split() 

symbol_to_name [ symbol]  =  name 

data.close( ) 

ucs 
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This  is  an  obvious  candidate  to  be  made  a  function. 


Make  it  a  function! 


symbol_to_name  =  {} 


data  =  open( ' elements . txt ' ) 

for  line  in  data: 

[symbol,  name]  =  line.split() 

symbol_to_name [ symbol]  =  name 


Input 


data.close( ) 
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We  know  what  our  input  should  be:  the  file  name. 
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Make  it  a  function! 


def  filename_to_dict( filename) : 
symbol_to_name  =  {} 
data  =  open(filename  ) 


Input 


for  line  in  data: 

[symbol,  name]  =  line.split() 

symbol_to_name [ symbol]  =  name 


data.close( ) 
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So  we  can  write  the  def  line  and  modify  the  script  to  use  the  input  variable.. 


Make  it  a  function! 


def  filename_to_dict( filename) : 

symbol_to_name  =  {} 

data  =  open(filename  ) 

for  line  in  data: 

[symbol,  name]  =  line. split () 

symbol_to_name [ symbol]  =  name 
data.close( ) 
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We  know  what  the  output  should  be:  the  dictionary. 


Output 
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Make  it  a  function! 


def  filename_to_dict( filename) : 

x_to_y  =  {} 

data  =  open(filename  ) 

for  line  in  data: 

[x,  y]  =  line.split() 

x_to_y  [x]  =  y 
data.close( ) 


Output 
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So  we  can  give  that  a  nice  generic  name  (x_to_y)  and  change  the  internal  variables 
to  match  (x  and  y). 


Make  it  a  function! 


def  filename_to_dict( filename) : 

x_to_y  =  {} 

data  =  open(filename  ) 

for  line  in  data: 

[x,  y]  =  line.split() 

x_to_y  [x]  =  y 
data.close( ) 
return(x_to_y) 


Output 


ucs 
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And  we  add  the  return  line  to  hand  back  the  dictionary. 


Exercise 

1.  Write  filename_to_dict  ( ) 
in  your  utils  module. 

2.  Write  a  script  that  does  this: 

a.  Loads  the  file  elements.txt  as  a  dictionary 
(This  maps  'Li'  -  'lithium'  for  example.) 

b.  Reads  each  line  of  inputs.txt 
(This  is  a  list  of  chemical  symbols.) 

c.  For  each  line,  prints  out  the  element  name 
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Here's  a  “skeleton”  of  the  script  you  need  to  write: 

#  Read  in  the  master  dictionary 
import  utils 

symbol_to_name  =  utils . f ilename_to_dict (' elements . txt ' ) 

#  Open  the  input  file 
input  =  ... 

#  Run  through  the  data  file  one  line  at  a  time 
for  line  in  input: 

symbol  =  line.strip() 

#  Look  up  the  name  of  the  element  with  this  symbol 
name  =  ... 
print  name 

#  Close  the  input  file 
input . ... 


All  you  have  to  do  is  to  fill  in  the  three  blanks. 


Keys  in  a  dictionary? 


total_weight  =  0 

for  symbol  in  symbol_to_name  : 

name  =  symbol_to_name [symbol] 
print  '%s\t%s'  %  (symbol,  name) 


“Treat  it  like  a  list” 
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How  can  we  tell  what  keys  are  in  a  dictionary? 

Specifically,  what  happens  when  we  want  to  run  through  all  the  keys  in  the  dictionary? 
We  use  the  Python  magic  of  “treat  it  like  a  list  and  it  behaves  like  a  list”.  In  the  case  of 
dictionaries  it  behaves  like  a  list  of  the  keys. 


“Treat  it  like  a  list” 

“Treat  it  like  a  list  and  it 
behaves  like  a  (useful)  list.” 

File 

List  of  lines 

String 

List  of  letters 

Dictionary 

List  of  keys 
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We've  seen  this  before.  Files  behave  like  this  list  of  lines  and  strings  behave  like  this 
list  of  letters.  Dictionaries  behave  like  the  list  of  keys. 


“Treat  it  like  a  list” 


for  item  in  list: 
blah  blah 
...item- 
blah  blah 


for  key  in  dictionary: 
blah  blah 
...dictionary  [key]... 
blah  blah 


ucs 
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So  we  can  use  a  dictionary  in  a  for...  loop  by  looping  through  all  its  keys  and  then 
looking  up  the  corresponding  value  in  the  body  of  the  loop. 


Missing  key? 

»>  data  =  { 'a' : 'alpha' ,  'b':'beta'} 
»>  data['g'] 

Traceback  (most  recent  call  last): 

File  "<stdin>",  line  1,  in  <module> 
KeyError:  ' g ' 


Dictionary  equivalent  of 

“index  out  of  range” 
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What  happens  if  we  ask  for  a  key  the  dictionary  doesn't  have?  Obviously  we  get  an 
error.  We  get  an  error  very  similar  to  the  error  we  get  if  we  ask  for  an  out  of  range 
index  in  a  list.  Instead  of  being  an  “IndexError”,  a  dictionary  returns  a  “KeyError”. 


“Treat  it  like  a  list” 


if  item  in  list: 
blah  blah 
...item- 
blah  blah 


if  key  in  dictionary: 
blah  blah 
...dictionary  [key]... 
blah  blah 
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So  we  need  to  be  able  to  tell  in  advance  wither  a  key  is  in  a  dictionary.  We  do  this 
using  the  “treat  like  a  list;  behave  like  a  list”  magic.  We  can  ask 
if  key  in  dictionary 
just  like  we  can  ask 
if  item  in  list 


Convert  to  a  list 

keys  =  list(data) 

print(keys) 

['b\  'a'] 

ucs 
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We  can  make  the  change  to  a  list  literally,  of  course,  with  the  type  converter  function 
list  ( ). 


Progress 

Keys  in  a  dictionary 
“Treat  it  like  a  list” 

list  (dictionary) - ►[keys] 

for  key  in  dictionary. 

■  ■  ■ 

if  key  in  dictionary : 

■  ■  ■ 
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Exercise 

Write  a  function  inver t  ( ) 
in  your  utils  module. 

symbol_to_name  'Li' - ►'Lithium' 

name_to_symbol  =  invert (symbol_to_name) 
name_to_symbol  'Lithium' - ►'Li' 
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Write  a  function  that  takes  a  dictionary  as  its  argument  and  returns  the  “reversed” 
dictionary  as  its  result. 

To  do  this  write  the  def  line  to  take  a  dictionary  x_to_y. 

Start  your  function  body  by  creating  an  empty  dictionary  y_to_x. 

Run  through  the  keys  of  x_to_y,  calling  the  key  x. 

For  each  x  look  up  the  corresponding  y  in  the  given  dictionary  x_to_y. 

For  that  x  and  y,  add  an  entry  to  the  y_to_x  dictionary. 

Once  the  loop  is  complete  return  the  y_to_x  dictionary. 


One  last  example 

Word  counting 

Given  a  text,  what  words  appear  and  how  often? 
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Let's  finish  with  one  last,  serious  example. 

We  are  going  to  analyze  some  text  and  count  up  how  often  each  word  in  the  text 
appears. 


Word  counting  algorithm 

Run  through  file  line-by-line 
Run  through  line  word-by-word 
Clean  up  word 
Is  word  in  dictionary? 

If  not:  add  word  as  key  with  value  0 
Increment  the  counter  for  that  word 

Output  words  alphabetically 
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This  is  what  we  are  going  to  do. 

We  will  require  the  file  to  be  counted  to  be  given  on  the  command  line. 


Word  counting  in  Python:  1 


#  Set  up 

Need  sys  for 

import  sys 

sys.argv 

count  =  {}  Empty  dictionary 

data  =  open(sys  .  argv[l] )  Filename  on 

command  line 
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We  need  to  import  the  sys  module  to  get  at  the  command  line  arguments  in  sys.argv. 


Word  counting  in  Python:  2 

for  line  in  data:  Lines 

for  word  in  line .  split  () :  Words 

clean_word  =  cleanup(word) 

We  need 
to  write  this 
function. 
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Next  we  run  through  the  data  pulling  apart  the  words.  This  is  a  very  crude  analysis  so 
we  suppose  a  simple  function  exists  that  cleans  up  a  word:  stripping  any  punctuation 
that  might  have  come  along  for  the  ride,  converting  everything  to  lower  case,  etc.  We 
will  need  to  write  this  function. 


Word  counting  in  Python:  3 

Insert  at  start  of  script 


def  cleanup(word_in) : 
word_out  =  word_in 
return  word  out 


“Placeholder” 

function 

. lower ( ) 


ucs 
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Here's  a  simple  cleanup  function.  All  it  does  is  convert  the  word  to  lower  case. 

If  you  want  a  better  function  that  strips  out  punctuation,  go  to  the  course  opn  “regular 
expressions”. 


Word  counting  in  Python:  4 

clean_word  =  cleanup(word )  Two  levels 

indented 

if  not  clean_word  in  count 
count [clean_word]  =  0 


count [clean_word]  =  count [clean_word]  +  1 

Increment 
count  for  word 


Create  new 
entry  in 
dictionary? 


Now  we  change  the  dictionary.  If  this  is  the  first  time  we  have  ever  seen  the  word,  we 
have  to  add  an  entry  to  the  dictionary.  Because  we  will  be  incrementing  the  dictionary 
value  in  a  moment  we  set  it  to  zero  on  creation. 


Word  counting  in  Python:  5 

count [clean_word] 

=  count [ . . . 

data. close( ) 

Be  tidy! 

words  =  list(count) 

All  the  words 

words . sort ( ) 

Alphabetical 

order 
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As  soon  as  we  have  finished  with  our  nested  loops  running  through  the  words,  we 
close  the  data  file. 

Our  specification  wanted  us  to  run  through  the  words  in  alphabetical  order.  The  order 
we  get  them  from  a  dictionary  by  treating  it  like  a  list  is  essentially  random,  so  we 
create  a  list  and  then  sort  it.  This  is  the  alphabetically  ordered  list  of  all  the  words  that 
appear  once  or  more  in  the  file.  Each  word  only  appears  once  in  the  list;  the  frequency 
with  which  they  appear  in  the  data  file  is  in  the  dictionary  value. 


Word  counting 

in  Python:  6 

words . sort ( ) 

Alphabetical 

order 

for  word  in  words: 

print ( '%s\t%d '  % 

(word, count [word] ) ) 

ucs 
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Then  we  print  out  all  the  words  and  values. 


Run  it! 

$  python  counter. py  treasure.txt 

What  changes  would  you  make  to  the  script? 
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There  is  a  script  prepared  for  you  with  this  worked  example  in  it. 

Run  it,  look  at  the  output.  Discuss  what  changes  you  would  make  to  improve  the 
script. 


And  we're  done! 

Python  types 
Python  control  structures 
Python  functions 
Python  modules 

and  now  you  are  ready 
to  do  things  with  Pythopi5 

And  that's  it! 


More  Python 

Python  for 

Absolute 

Beginners 

Python: 

_ ^Regular 

/  expressions 

/  Python:  Pn^  .  t  . 

'  +u  +  —Object  oriented 

/Further  topics  J 

/  M  programming 

Python  for 
Programmers^^ 

/^Python: 

\  Checkpointing 

^^Python: 

O/S  access 

ucs 
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Unless  you  want  more,  of  course. 


